tweet_parser.getter_methods package

Submodules

tweet_parser.getter_methods.gnip_fields module

tweet_parser.getter_methods.gnip_fields.get_matching_rules(tweet)[source]

Retrieves the matching rules for a tweet with a gnip field enrichment.

Parameters:tweet (Tweet) – the tweet
Returns:potential [{"tag": "user_tag", "value": "rule_value"}] pairs from standard rulesets or None if no rules or no matching_rules field is found.

More information on this value at: http://support.gnip.com/enrichments/matching_rules.html

Return type:list

tweet_parser.getter_methods.tweet_counts module

Tweet counts and related attributes

This module holds attributes related to basic counts on tweets, such as retweets, favs, and quotes. It is unlikely to be extended.

tweet_parser.getter_methods.tweet_counts.get_favorite_count(tweet)[source]

Gets the favorite count for this tweet.

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:The number of times the Tweet has been favorited
Return type:int

Example

>>> from tweet_parser.getter_methods.tweet_counts import get_favorite_count
>>> tweet = {'created_at': '2017-21-23T15:21:21.000Z',
...          'id_str': '2382763597',
...          'favorite_count': 2}
>>> get_favorite_count(tweet)
2
>>> activity_streams_tweet = {'postedTime': '2017-05-24T20:17:19.000Z',
...                           'favoritesCount': 3}
>>> get_favorite_count(activity_streams_tweet)
3
tweet_parser.getter_methods.tweet_counts.get_quote_count(tweet)[source]

Gets the quote count for this tweet.

Note that this is unavailable in activity-streams format

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:The number of times the Tweet has been quoted or for activity-streams raise a NotAvailableError
Return type:int

Example

>>> from tweet_parser.getter_methods.tweet_counts import get_quote_count
>>> tweet = {'created_at': '2017-21-23T15:21:21.000Z',
...          'id_str': '2382763597',
...          'quote_count': 2}
>>> get_quote_count(tweet)
2
tweet_parser.getter_methods.tweet_counts.get_retweet_count(tweet)[source]

Gets the retweet count for this tweet.

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:The number of times the Tweet has been retweeted
Return type:int

Example

>>> from tweet_parser.getter_methods.tweet_counts import get_retweet_count
>>> tweet = {'created_at': '2017-21-23T15:21:21.000Z',
...          'id_str': '2382763597',
...          'retweet_count': 2}
>>> get_retweet_count(tweet)
2
>>> activity_streams_tweet = {'postedTime': '2017-05-24T20:17:19.000Z',
...                           'retweetCount': 3}
>>> get_retweet_count(activity_streams_tweet)
3

tweet_parser.getter_methods.tweet_date module

tweet_parser.getter_methods.tweet_date.snowflake2utc(sf)[source]

Convert a Twitter snowflake ID to a Unix timestamp (seconds since Jan 1 1970 00:00:00)

Parameters:sf (str) – Twitter snowflake ID as a string
Returns:seconds since Jan 1 1970 00:00:00
Return type:int

tweet_parser.getter_methods.tweet_embeds module

tweet_parser.getter_methods.tweet_embeds.get_embedded_tweet(tweet)[source]

Get the retweeted Tweet OR the quoted Tweet and return it as a dictionary

Parameters:tweet (Tweet) – A Tweet object (not simply a dict)
Returns:a dictionary representing the quote Tweet or the Retweet
Return type:dict (or None, if the Tweet is neither a quote tweet or a Retweet)
tweet_parser.getter_methods.tweet_embeds.get_quoted_tweet(tweet)[source]

Get the quoted Tweet and return it as a dictionary If the Tweet is not a quote Tweet, return None

Parameters:tweet (Tweet or dict) – A Tweet object or a dictionary
Returns:A dictionary representing the quoted status or None if there is no quoted status.
  • For original format, this is the value of “quoted_status”
  • For activity streams, this is the value of “twitter_quoted_status”
Return type:dict
tweet_parser.getter_methods.tweet_embeds.get_retweeted_tweet(tweet)[source]

Get the retweeted Tweet and return it as a dictionary If the Tweet is not a Retweet, return None

Parameters:tweet (Tweet or dict) – A Tweet object or a dictionary
Returns:A dictionary representing the retweeted status or None if there is no quoted status.
  • For original format, this is the value of “retweeted_status”
  • For activity streams, If the Tweet is a Retweet this is the value of the key “object”
Return type:dict

tweet_parser.getter_methods.tweet_entities module

tweet_parser.getter_methods.tweet_entities.get_entities(tweet)[source]

Helper function to simply grabbing the entities.

Caveat: In the case of Retweets, a Retweet is stored as “RT @someone: Some awesome status”. In the case where pre-appending the string “RT @someone:” causes the Tweet to exceed 140 characters, entites (hashtags, mentions, urls) beyond the 140 character mark are excluded from the Retweet’s entities. This seems like counterintuitive behavior, so we ensure here that the entities of a Retweet are a superset of the entities of the Retweeted status.

Parameters:tweet (Tweet or dict) – Tweet in question
Returns:dictionary of potential entities.
Return type:dict

Example

>>> from tweet_parser.getter_methods.tweet_entities import get_entities
>>> original = {"created_at": "Wed May 24 20:17:19 +0000 2017",
...             "entities": {"user_mentions": [{
...                              "indices": [14,26], #characters where the @ mention appears
...                              "id_str": "2382763597", #id of @ mentioned user as a string
...                              "screen_name": "notFromShrek", #screen_name of @ mentioned user
...                              "name": "Fiona", #display name of @ mentioned user
...                              "id": 2382763597 #id of @ mentioned user as an int
...                            }]
...                          }
...             }
>>> get_entities(original)
{'user_mentions': [{'indices': [14, 26], 'id_str': '2382763597', 'screen_name': 'notFromShrek', 'name': 'Fiona', 'id': 2382763597}]}
tweet_parser.getter_methods.tweet_entities.get_hashtags(tweet)[source]

Get a list of hashtags in the Tweet Note that in the case of a quote-tweet, this does not return the hashtags in the quoted status.

Parameters:tweet (Tweet or dict) – A Tweet object or dictionary
Returns:list of all of the hashtags in the Tweet
Return type:list (a list of strings)

Example

>>> from tweet_parser.getter_methods.tweet_entities import get_hashtags
>>> original = {"created_at": "Wed May 24 20:17:19 +0000 2017",
...            "entities": {"hashtags": [{"text":"1hashtag"}]}}
>>> get_hashtags(original)
['1hashtag']
>>> activity = {"postedTime": "2017-05-24T20:17:19.000Z",
...             "verb": "post",
...             "twitter_entities": {"hashtags": [
...                     {"text":"1hashtag"},
...                     {"text": "moreHashtags"}]}}
>>> get_hashtags(activity)
['1hashtag', 'moreHashtags']
tweet_parser.getter_methods.tweet_entities.get_media_entities(tweet)[source]

Grabs all the media entities from a tweet, which are contained in the “extended_entities” or “twitter_extended_entities” field depending on the tweet format. Note that this is not the same as the first media entity from the basic entities key; this is required to get all of the potential media contained within a tweet. This is useful as an entry point for other functions or for any custom parsing that needs to be done.

Parameters:tweet (Tweet or dict) – the tweet in question
Returns:the list of dicts containing each media’s metadata in the tweet.
Return type:list or None

Example

>>> from tweet_parser.getter_methods.tweet_entities import get_media_entities
>>> tweet = {'created_at': '2017-21-23T15:21:21.000Z',
...          'entities': {'user_mentions': [{'id': 2382763597,
...          'id_str': '2382763597',
...          'indices': [14, 26],
...          'name': 'Fiona',
...          'screen_name': 'notFromShrek'}]},
...          'extended_entities': {'media': [{'display_url': 'pic.twitter.com/something',
...          'expanded_url': 'https://twitter.com/something',
...          'id': 4242,
...          'id_str': '4242',
...          'indices': [88, 111],
...          'media_url': 'http://pbs.twimg.com/media/something.jpg',
...          'media_url_https': 'https://pbs.twimg.com/media/something.jpg',
...          'sizes': {'large': {'h': 1065, 'resize': 'fit', 'w': 1600},
...          'medium': {'h': 799, 'resize': 'fit', 'w': 1200},
...          'small': {'h': 453, 'resize': 'fit', 'w': 680},
...          'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
...          'type': 'photo',
...          'url': 'https://t.co/something'},
...          {'display_url': 'pic.twitter.com/something_else',
...          'expanded_url': 'https://twitter.com/user/status/something/photo/1',
...          'id': 4243,
...          'id_str': '4243',
...          'indices': [88, 111],
...          'media_url': 'http://pbs.twimg.com/media/something_else.jpg',
...          'media_url_https': 'https://pbs.twimg.com/media/something_else.jpg',
...          'sizes': {'large': {'h': 1065, 'resize': 'fit', 'w': 1600},
...          'medium': {'h': 799, 'resize': 'fit', 'w': 1200},
...          'small': {'h': 453, 'resize': 'fit', 'w': 680},
...          'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
...          'type': 'photo',
...          'url': 'https://t.co/something_else'}]}
...         }
>>> get_media_entities(tweet)
[{'display_url': 'pic.twitter.com/something', 'expanded_url': 'https://twitter.com/something', 'id': 4242, 'id_str': '4242', 'indices': [88, 111], 'media_url': 'http://pbs.twimg.com/media/something.jpg', 'media_url_https': 'https://pbs.twimg.com/media/something.jpg', 'sizes': {'large': {'h': 1065, 'resize': 'fit', 'w': 1600}, 'medium': {'h': 799, 'resize': 'fit', 'w': 1200}, 'small': {'h': 453, 'resize': 'fit', 'w': 680}, 'thumb': {'h': 150, 'resize': 'crop', 'w': 150}}, 'type': 'photo', 'url': 'https://t.co/something'}, {'display_url': 'pic.twitter.com/something_else', 'expanded_url': 'https://twitter.com/user/status/something/photo/1', 'id': 4243, 'id_str': '4243', 'indices': [88, 111], 'media_url': 'http://pbs.twimg.com/media/something_else.jpg', 'media_url_https': 'https://pbs.twimg.com/media/something_else.jpg', 'sizes': {'large': {'h': 1065, 'resize': 'fit', 'w': 1600}, 'medium': {'h': 799, 'resize': 'fit', 'w': 1200}, 'small': {'h': 453, 'resize': 'fit', 'w': 680}, 'thumb': {'h': 150, 'resize': 'crop', 'w': 150}}, 'type': 'photo', 'url': 'https://t.co/something_else'}]
tweet_parser.getter_methods.tweet_entities.get_media_urls(tweet)[source]

Gets the https links to each media entity in the tweet.

Parameters:tweet (Tweet or dict) – tweet
Returns:list of urls. Will be an empty list if there are no urls present.
Return type:list

Example

>>> from tweet_parser.getter_methods.tweet_entities import get_media_urls
>>> tweet = {'created_at': '2017-21-23T15:21:21.000Z',
...          'entities': {'user_mentions': [{'id': 2382763597,
...          'id_str': '2382763597',
...          'indices': [14, 26],
...          'name': 'Fiona',
...          'screen_name': 'notFromShrek'}]},
...          'extended_entities': {'media': [{'display_url': 'pic.twitter.com/something',
...          'expanded_url': 'https://twitter.com/something',
...          'id': 4242,
...          'id_str': '4242',
...          'indices': [88, 111],
...          'media_url': 'http://pbs.twimg.com/media/something.jpg',
...          'media_url_https': 'https://pbs.twimg.com/media/something.jpg',
...          'sizes': {'large': {'h': 1065, 'resize': 'fit', 'w': 1600},
...          'medium': {'h': 799, 'resize': 'fit', 'w': 1200},
...          'small': {'h': 453, 'resize': 'fit', 'w': 680},
...          'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
...          'type': 'photo',
...          'url': 'https://t.co/something'},
...          {'display_url': 'pic.twitter.com/something_else',
...          'expanded_url': 'https://twitter.com/user/status/something/photo/1',
...          'id': 4243,
...          'id_str': '4243',
...          'indices': [88, 111],
...          'media_url': 'http://pbs.twimg.com/media/something_else.jpg',
...          'media_url_https': 'https://pbs.twimg.com/media/something_else.jpg',
...          'sizes': {'large': {'h': 1065, 'resize': 'fit', 'w': 1600},
...          'medium': {'h': 799, 'resize': 'fit', 'w': 1200},
...          'small': {'h': 453, 'resize': 'fit', 'w': 680},
...          'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
...          'type': 'photo',
...          'url': 'https://t.co/something_else'}]}
...         }
>>> get_media_urls(tweet)
['https://pbs.twimg.com/media/something.jpg', 'https://pbs.twimg.com/media/something_else.jpg']
tweet_parser.getter_methods.tweet_entities.get_user_mentions(tweet)[source]

Get the @-mentions in the Tweet as dictionaries. Note that in the case of a quote-tweet, this does not return the users mentioned in the quoted status. The recommended way to get that list would be to use get_user_mentions on the quoted status. Also note that in the caes of a quote-tweet, the list of @-mentioned users does not include the user who authored the original (quoted) Tweet.

Parameters:tweet (Tweet or dict) – A Tweet object or dictionary
Returns:1 item per @ mention. Note that the fields here aren’t enforced by the parser, they are simply the fields as they appear in a Tweet data payload.
Return type:list (list of dicts)

Example

>>> from tweet_parser.getter_methods.tweet_entities import get_user_mentions
>>> original = {"created_at": "Wed May 24 20:17:19 +0000 2017",
...             "text": "RT @notFromShrek: Stuff! Words! ...",
...             "entities": {"user_mentions": [{
...                              "indices": [2,12], #characters where the @ mention appears
...                              "id_str": "2382763597", #id of @ mentioned user as a string
...                              "screen_name": "notFromShrek", #screen_name of @d user
...                              "name": "Fiona", #display name of @ mentioned user
...                              "id": 2382763597 #id of @ mentioned user as an int
...                            }]
...                          },
...             "retweeted_status": {
...                 "created_at": "Wed May 24 20:01:19 +0000 2017",
...                 "text": "Stuff! Words! #Tweeting!",
...                 "entities": {"user_mentions": []}
...                 }
...             }
>>> get_user_mentions(original)
[{'indices': [2, 12], 'id_str': '2382763597', 'screen_name': 'notFromShrek', 'name': 'Fiona', 'id': 2382763597}]

tweet_parser.getter_methods.tweet_generator module

class tweet_parser.getter_methods.tweet_generator.GeneratorHTMLParser(*, convert_charrefs=True)[source]

Bases: html.parser.HTMLParser

HTML parser class to handle HTML tags in the original format source field

handle_data(data)[source]
handle_starttag(tag, attrs)[source]
tweet_parser.getter_methods.tweet_generator.get_generator(tweet)[source]

Get information about the application that generated the Tweet

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:keys are ‘link’ and ‘name’, the web link and the name of the application
Return type:dict

Example

>>> from tweet_parser.getter_methods.tweet_generator import get_generator
>>> original_format_dict = {
...             "created_at": "Wed May 24 20:17:19 +0000 2017",
...             "source": '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>'
...            }
>>> get_generator(original_format_dict)
{'link': 'http://twitter.com', 'name': 'Twitter Web Client'}
>>> activity_streams_format_dict = {
...             "postedTime": "2017-05-24T20:17:19.000Z",
...             "generator":
...              {"link": "http://twitter.com",
...               "displayName": "Twitter Web Client"}
...             }
>>> get_generator(activity_streams_format_dict)
{'link': 'http://twitter.com', 'name': 'Twitter Web Client'}

tweet_parser.getter_methods.tweet_geo module

tweet_parser.getter_methods.tweet_geo.get_geo_coordinates(tweet)[source]

Get the user’s geo coordinates, if they are included in the payload (otherwise return None)

Parameters:tweet (Tweet or dict) – A Tweet object or dictionary
Returns:
dictionary with the keys “latitude” and “longitude”
or, if unavaiable, None
Return type:dict

Example

>>> from tweet_parser.getter_methods.tweet_geo import get_geo_coordinates
>>> tweet_geo = {"geo": {"coordinates": [1,-1]}}
>>> get_geo_coordinates(tweet_geo)
{'latitude': 1, 'longitude': -1}
>>> tweet_no_geo = {"geo": {}}
>>> get_geo_coordinates(tweet_no_geo) #returns None
tweet_parser.getter_methods.tweet_geo.get_profile_location(tweet)[source]

Get user’s derived location data from the profile location enrichment If unavailable, returns None.

Parameters:tweet (Tweet or dict) – Tweet object or dictionary
Returns:more information on the profile locations enrichment here: http://support.gnip.com/enrichments/profile_geo.html
Return type:dict

Example

>>> result = {"country": "US",         # Two letter ISO-3166 country code
...           "locality": "Boulder",   # The locality location (~ city)
...           "region": "Colorado",    # The region location (~ state/province)
...           "sub_region": "Boulder", # The sub-region location (~ county)
...           "full_name": "Boulder, Colorado, US", # The full name (excluding sub-region)
...           "geo":  [40,-105]        # lat/long value that coordinate that corresponds to
...                            # the lowest granularity location for where the user
...                            # who created the Tweet is from
...  }
Caveats:
This only returns the first element of the ‘locations’ list. I’m honestly not sure what circumstances would result in a list that is more than one element long.

tweet_parser.getter_methods.tweet_reply module

tweet_parser.getter_methods.tweet_reply.get_in_reply_to_screen_name(tweet)[source]

Get the screen name of the user whose Tweet is being replied to, None if this Tweet is not a reply

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:the screen name of the user whose Tweet is being replied to (None if not a reply)
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_reply import *
>>> original_format_dict = {
...             "created_at": "Wed May 24 20:17:19 +0000 2017",
...             "in_reply_to_screen_name": "notFromShrek"
...            }
>>> get_in_reply_to_screen_name(original_format_dict)
'notFromShrek'
>>> activity_streams_format_dict = {
...         "postedTime": "2017-05-24T20:17:19.000Z",
...         "inReplyTo":
...            {"link": "http://twitter.com/notFromShrek/statuses/863566329168711681"}
...         }
>>> get_in_reply_to_screen_name(activity_streams_format_dict)
'notFromShrek'
tweet_parser.getter_methods.tweet_reply.get_in_reply_to_status_id(tweet)[source]

Get the tweet id of the Tweet being replied to, None if this Tweet is not a reply

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:the tweet id of the Tweet being replied to (None if not a reply)
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_reply import *
>>> original_format_dict = {
...             "created_at": "Wed May 24 20:17:19 +0000 2017",
...             "in_reply_to_status_id_str": "863566329168711681"
...            }
>>> get_in_reply_to_status_id(original_format_dict)
'863566329168711681'
>>> activity_streams_format_dict = {
...         "postedTime": "2017-05-24T20:17:19.000Z",
...         "inReplyTo":
...            {"link": "http://twitter.com/notFromShrek/statuses/863566329168711681"}
...         }
>>> get_in_reply_to_status_id(activity_streams_format_dict)
'863566329168711681'
tweet_parser.getter_methods.tweet_reply.get_in_reply_to_user_id(tweet)[source]

Get the user id of the uesr whose Tweet is being replied to, and None if this Tweet is not a reply.

Note that this is unavailable in activity-streams format

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:the user id of the user whose Tweet is being replied to, None (if not a reply), or for activity-streams raise a NotAvailableError
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_reply import *
>>> original_format_dict = {
...             "created_at": "Wed May 24 20:17:19 +0000 2017",
...             "in_reply_to_user_id_str": "2382763597"
...            }
>>> get_in_reply_to_user_id(original_format_dict)
'2382763597'

tweet_parser.getter_methods.tweet_text module

tweet_parser.getter_methods.tweet_text.get_all_text(tweet)[source]

Get all of the text of the tweet. This includes @ mentions, long links, quote-tweet contents (separated by a newline), RT contents & poll options

Parameters:tweet (Tweet) – A Tweet object (must be a Tweet object)
Returns:text from tweet.user_entered_text, tweet.quote_or_rt_text and tweet.poll_options (if in original format), separated by newlines
Return type:str
tweet_parser.getter_methods.tweet_text.get_full_text(tweet)[source]

Get the full text of a tweet dict. Includes @-mention replies and long links.

Parameters:tweet (Tweet or dict) – A Tweet object or dictionary
Returns:the untruncated text of a Tweet (finds extended text if available)
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_text import get_full_text
>>> # getting the text of a Tweet that is not truncated
>>> original_untruncated = {
...                 "created_at": "Wed May 24 20:17:19 +0000 2017",
...                 "truncated": False,
...                 "text": "some tweet text"
...                }
>>> get_full_text(original_untruncated)
'some tweet text'
>>> activity_untruncated = {"postedTime": "2017-05-24T20:17:19.000Z",
...                         "body": "some tweet text"
...                        }
>>> get_full_text(activity_untruncated)
'some tweet text'
>>> # getting the text of a truncated Tweet (has over 140 chars)
>>> original_truncated = {
...               "created_at": "Wed May 24 20:17:19 +0000 2017",
...               "text": "some tweet text, lorem ip...",
...               "truncated": True,
...               "extended_tweet":
...                 {"full_text":
...                   "some tweet text, lorem ipsum dolor sit amet"}
...               }
>>> get_full_text(original_truncated)
'some tweet text, lorem ipsum dolor sit amet'
>>> activity_truncated = {
...               "postedTime": "2017-05-24T20:17:19.000Z",
...               "body": "some tweet text, lorem ip...",
...               "long_object":
...                 {"body":
...                   "some tweet text, lorem ipsum dolor sit amet"}
...              }
>>> get_full_text(activity_truncated)
'some tweet text, lorem ipsum dolor sit amet'
tweet_parser.getter_methods.tweet_text.get_lang(tweet)[source]

Get the language that the Tweet is written in.

Parameters:tweet (Tweet or dict) – A Tweet object or dictionary
Returns:2-letter BCP 47 language code (or None if undefined)
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_text import get_lang
>>> original = {"created_at": "Wed May 24 20:17:19 +0000 2017",
...             "lang": "en"}
>>> get_lang(original)
'en'
>>> activity = {"postedTime": "2017-05-24T20:17:19.000Z",
...             "twitter_lang": "en"}
>>> get_lang(activity)
'en'
tweet_parser.getter_methods.tweet_text.get_poll_options(tweet)[source]

Get the text in the options of a poll as a list - If there is no poll in the Tweet, return an empty list - If the Tweet is in activity-streams format, raise ‘NotAvailableError’

Parameters:tweet (Tweet or dict) – A Tweet object or dictionary
Returns:list of strings, or, in the case where there is no poll, an empty list
Return type:list
Raises:NotAvailableError for activity-streams format

Example

>>> from tweet_parser.getter_methods.tweet_text import get_poll_options
>>> original = {
...             "created_at": "Wed May 24 20:17:19 +0000 2017",
...             "entities": {"polls": [{"options": [{"text":"a"},
...                                                 {"text":"b"},
...                                                 {"text":"c"}]
...                             }]},
...            }
>>> get_poll_options(original)
['a', 'b', 'c']
>>> activity = {"postedTime": "2017-05-24T20:17:19.000Z",
...             "body": "some tweet text"}
>>> get_poll_options(activity)
Traceback (most recent call last):
...
NotAvailableError: Gnip activity-streams format does not return poll options
tweet_parser.getter_methods.tweet_text.get_quote_or_rt_text(tweet)[source]

Get the quoted or retweeted text in a Tweet (this is not the text entered by the posting user) - tweet: empty string (there is no quoted or retweeted text) - quote: only the text of the quoted Tweet - retweet: the text of the retweet

Parameters:tweet (Tweet or dict) – A Tweet object or dictionary
Returns:text of the retweeted-tweet or the quoted-tweet (empty string if this is an original Tweet)
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_text import get_quote_or_rt_text
>>> # a quote tweet
>>> quote = {"created_at": "Wed May 24 20:17:19 +0000 2017",
...          "text": "adding my own commentary",
...          "truncated": False,
...          "quoted_status": {
...                 "created_at": "Mon May 01 05:00:05 +0000 2017",
...                 "truncated": False,
...                 "text": "an interesting Tweet"
...                }
...         }
>>> get_quote_or_rt_text(quote)
'an interesting Tweet'
tweet_parser.getter_methods.tweet_text.get_text(tweet)[source]

Get the contents of “text” (original format) or “body” (activity streams format)

Parameters:tweet (Tweet or dict) – A Tweet object or dictionary
Returns:the contents of “text” key (original format) or “body” key (activity streams format)
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_text import get_text
>>> original = {
...             "created_at": "Wed May 24 20:17:19 +0000 2017",
...             "text": "some tweet text"}
>>> get_text(original)
'some tweet text'
>>> activity = {"postedTime": "2017-05-24T20:17:19.000Z",
...             "body": "some tweet text"}
>>> get_text(activity)
'some tweet text'
tweet_parser.getter_methods.tweet_text.get_tweet_type(tweet)[source]

Get the type of Tweet this is (3 options: tweet, quote, and retweet)

Parameters:tweet (Tweet or dict) – A Tweet object or dictionary
Returns:(one of 3 strings) “tweet”: an original Tweet “retweet”: a native retweet (created with the retweet button) “quote”: a native quote tweet (etweet button + adding quote text)
Return type:str
Caveats:
When a quote-tweet (tweet A) is quote-tweeted (tweet B), the innermost quoted tweet (A) in the payload (for B) no longer has the key “quoted_status” or “twitter_quoted_status”, and that tweet (A) would be labeled as a “tweet” (not a “quote”).

Helper function to remove the links from the input text

Parameters:text (str) – A string
Returns:the same text, but with any substring that matches the regex for a link removed and replaced with a space
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_text import remove_links
>>> text = "lorem ipsum dolor https://twitter.com/RobotPrincessFi"
>>> remove_links(text)
'lorem ipsum dolor  '

tweet_parser.getter_methods.tweet_user module

tweet_parser.getter_methods.tweet_user.get_bio(tweet)[source]

Get the bio text of the user who posted the Tweet

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:the bio text of the user who posted the Tweet In a payload the abscence of a bio seems to be represented by an empty string or a None, this getter always returns a string (so, empty string if no bio is available).
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_user import get_bio
>>> original_format_dict = {
...             "created_at": "Wed May 24 20:17:19 +0000 2017",
...             "user":
...              {"description": "Niche millenial content aggregator"}
...            }
>>> get_bio(original_format_dict)
'Niche millenial content aggregator'
>>> activity_streams_format_dict = {
...             "postedTime": "2017-05-24T20:17:19.000Z",
...             "actor":
...              {"summary": "Niche millenial content aggregator"}
...             }
>>> get_bio(activity_streams_format_dict)
'Niche millenial content aggregator'
tweet_parser.getter_methods.tweet_user.get_follower_count(tweet)[source]

Get the number of followers that the user has

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:the number of followers that the user has
Return type:int

Example

>>> from tweet_parser.getter_methods.tweet_user import get_follower_count
>>> original_format_dict = {
...             "created_at": "Wed May 24 20:17:19 +0000 2017",
...             "user":
...              {"followers_count": 2}
...            }
>>> get_follower_count(original_format_dict)
2
>>> activity_streams_format_dict = {
...             "postedTime": "2017-05-24T20:17:19.000Z",
...             "actor":
...              {"followersCount": 2}
...             }
>>> get_follower_count(activity_streams_format_dict)
2
tweet_parser.getter_methods.tweet_user.get_following_count(tweet)[source]

Get the number of accounts that the user is following

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:the number of accounts that the user is following
Return type:int

Example

>>> from tweet_parser.getter_methods.tweet_user import get_following_count
>>> original_format_dict = {
...             "created_at": "Wed May 24 20:17:19 +0000 2017",
...             "user":
...              {"friends_count": 2}
...            }
>>> get_following_count(original_format_dict)
2
>>> activity_streams_format_dict = {
...             "postedTime": "2017-05-24T20:17:19.000Z",
...             "actor":
...              {"friendsCount": 2}
...             }
>>> get_following_count(activity_streams_format_dict)
2
tweet_parser.getter_methods.tweet_user.get_klout_id(tweet)[source]

Warning: Klout is deprecated and is being removed from Tweet payloads May 2018.

See https://developer.twitter.com/en/docs/tweets/enrichments/overview/klout

Get the Klout ID of the user (str) (if it exists)

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:the user’s Klout ID (if it exists), else return None
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_user import get_klout_id
>>> original_format_dict = {
... "created_at": "Wed May 24 20:17:19 +0000 2017",
...     "user":
...         {"derived": {"klout":
...             {"user_id":"1234567890"}}}
...     }
>>> get_klout_id(original_format_dict)
'1234567890'
>>> activity_streams_format_dict = {
... "postedTime": "2017-05-24T20:17:19.000Z",
... "gnip":
...     {"klout_profile": {
...         "klout_user_id": "1234567890"}
...     }}
>>> get_klout_id(activity_streams_format_dict)
'1234567890'
tweet_parser.getter_methods.tweet_user.get_klout_profile(tweet)[source]

Warning: Klout is deprecated and is being removed from Tweet payloads May 2018.

See https://developer.twitter.com/en/docs/tweets/enrichments/overview/klout

Get the Klout profile URL of the user (str) (if it exists)

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:the user’s Klout profile URL (if it exists), else return None
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_user import get_klout_profile
>>> original_format_dict = {
... "created_at": "Wed May 24 20:17:19 +0000 2017",
... "user":
...     {"derived": {"klout":
...         {"profile_url":
...             "http://klout.com/topic/id/10000000000000016635"}}}
... }
>>> get_klout_profile(original_format_dict)
'http://klout.com/topic/id/10000000000000016635'
>>> activity_streams_format_dict = {
... "postedTime": "2017-05-24T20:17:19.000Z",
... "gnip":
...     {"klout_profile": {
...         "link": "http://klout.com/topic/id/10000000000000016635"}
...     }
... }
>>> get_klout_profile(activity_streams_format_dict)
'http://klout.com/topic/id/10000000000000016635'
tweet_parser.getter_methods.tweet_user.get_klout_score(tweet)[source]

Warning: Klout is deprecated and is being removed from Tweet payloads May 2018.

See https://developer.twitter.com/en/docs/tweets/enrichments/overview/klout

Get the Klout score (int) (if it exists) of the user who posted the Tweet

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:
the Klout score (if it exists) of the user who posted the Tweet
else return None
Return type:int

Example

>>> from tweet_parser.getter_methods.tweet_user import get_klout_score
>>> original_format_dict = {
...             "created_at": "Wed May 24 20:17:19 +0000 2017",
...             "user":
...              {"derived": {"klout": {"score": 12345}}}
...            }
>>> get_klout_score(original_format_dict)
12345
>>> activity_streams_format_dict = {
...             "postedTime": "2017-05-24T20:17:19.000Z",
...             "gnip":{"klout_score": 12345}}
>>> get_klout_score(activity_streams_format_dict)
12345
tweet_parser.getter_methods.tweet_user.get_klout_topics(tweet, topic_type='influence')[source]

Warning: Klout is deprecated and is being removed from Tweet payloads May 2018.

See https://developer.twitter.com/en/docs/tweets/enrichments/overview/klout

Get the user’s chosen Klout topics (a list of dicts), if it exists. Regardless of format or topic type, topic dicts will have the same keys: “url”, “id”, “name”, “score”

Parameters:
  • tweet (Tweet) – A Tweet object
  • topic_type (str) – Which type of Klout topic to return. Options are limited to ‘influence’ and ‘interest’
Returns:

A list of dicts representing Klout topics, or if Klout topics do not exist in the Tweet payload, return None. The list is sorted by the “score” value.

Return type:

list

Example

>>> result = [{
...     # the user's score for that topic
...     "score": 0.54,
...     # the Klout topic ID
...     "id": "10000000000000019376",
...     # the Klout topic URL
...     "url": "http://klout.com/topic/id/10000000000000019376",
...     # the Klout topic name
...     "name": "Emoji"
... },
... {
... "score": 0.43,
... "id": "9159",
... "url": "http://klout.com/topic/id/9159",
... "name": "Vegetables"
... }]
tweet_parser.getter_methods.tweet_user.get_name(tweet)[source]

Get the display name of the user who posted the Tweet

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:the @ handle of the user who posted the Tweet
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_user import get_name
>>> original_format_dict = {
...             "created_at": "Wed May 24 20:17:19 +0000 2017",
...             "user":
...              {"name": "jk no"}
...            }
>>> get_name(original_format_dict)
'jk no'
>>> activity_streams_format_dict = {
...             "postedTime": "2017-05-24T20:17:19.000Z",
...             "actor":
...              {"displayName": "jk no"}
...             }
>>> get_name(activity_streams_format_dict)
'jk no'
tweet_parser.getter_methods.tweet_user.get_screen_name(tweet)[source]

Get the screen name (@ handle) of the user who posted the Tweet

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:the @ handle of the user who posted the Tweet
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_user import get_screen_name
>>> original_format_dict = {
...             "created_at": "Wed May 24 20:17:19 +0000 2017",
...             "user":
...              {"screen_name": "RobotPrincessFi"}
...            }
>>> get_screen_name(original_format_dict)
'RobotPrincessFi'
>>> activity_streams_format_dict = {
...             "postedTime": "2017-05-24T20:17:19.000Z",
...             "actor":
...              {"preferredUsername": "RobotPrincessFi"}
...             }
>>> get_screen_name(activity_streams_format_dict)
'RobotPrincessFi'
tweet_parser.getter_methods.tweet_user.get_user_id(tweet)[source]

Get the Twitter ID of the user who posted the Tweet

Parameters:tweet (Tweet) – A Tweet object (or a dictionary)
Returns:the Twitter ID of the user who posted the Tweet
Return type:str

Example

>>> from tweet_parser.getter_methods.tweet_user import get_user_id
>>> original_format_dict = {
...             "created_at": "Wed May 24 20:17:19 +0000 2017",
...             "user":
...              {"id_str": "815279070241955840"}
...            }
>>> get_user_id(original_format_dict)
'815279070241955840'
>>> activity_streams_format_dict = {
...             "postedTime": "2017-05-24T20:17:19.000Z",
...             "actor":
...              {"id": "id:twitter.com:815279070241955840"}
...             }
>>> get_user_id(activity_streams_format_dict)
'815279070241955840'

Module contents