Module to define a lazy property decorator that allows attributes to be generated dynamically and cached after creation. Original idea found via http://stevenloria.com/lazy-evaluated-properties-in-python/ and lightly modified to preserve underlying docstrings.
tweet_parser.tweet.
Tweet
(tweet_dict, do_format_validation=False)[source]¶Bases: dict
Tweet object created from a dictionary representing a Tweet paylaod
Parameters: |
|
---|---|
Returns: | Class “Tweet”, inherits from dict, provides properties to get various data values from the Tweet. |
Return type: | |
Raises: | NotATweetError – the Tweet dict is malformed, see tweet_checking.check_tweet for details |
Example
>>> from tweet_parser.tweet import Tweet
>>> # python dict representing a Tweet
>>> tweet_dict = {"id": 867474613139156993,
... "id_str": "867474613139156993",
... "created_at": "Wed May 24 20:17:19 +0000 2017",
... "text": "Some Tweet text",
... "user": {
... "screen_name": "RobotPrincessFi",
... "id_str": "815279070241955840"
... }
... }
>>> # create a Tweet object
>>> tweet = Tweet(tweet_dict)
>>> # use the Tweet obj to access data elements
>>> tweet.id
'867474613139156993'
>>> tweet.created_at_seconds
1495657039
all_text
¶All of the text of the tweet. This includes @ mentions, long links, quote-tweet contents (separated by a newline), RT contents & poll options
Returns: | value returned by calling tweet_text.get_all_text on self |
---|---|
Return type: | str |
bio
¶The bio text of the user who posted the Tweet
Returns: | the user’s bio text. value returned by calling tweet_user.get_bio on self |
---|---|
Return type: | str |
created_at_datetime
¶Time that a Tweet was posted as a Python datetime object
Returns: | the value of tweet.created_at_seconds converted into a datetime object |
---|---|
Return type: | datetime.datetime |
created_at_seconds
¶Time that a Tweet was posted in seconds since the Unix epoch
Returns: | seconds since the unix epoch (determined by converting Tweet.id into a timestamp using tweet_date.snowflake2utc) |
---|---|
Return type: | int |
created_at_string
¶Time that a Tweet was posted as a string with the format YYYY-mm-ddTHH:MM:SS.000Z
Returns: | the value of tweet.created_at_seconds converted into a string (YYYY-mm-ddTHH:MM:SS.000Z) |
---|---|
Return type: | str |
embedded_tweet
¶Get the retweeted Tweet OR the quoted Tweet and return it as a Tweet object
Returns: | a Tweet representing the quote Tweet or the Retweet (see tweet_embeds.get_embedded_tweet, this is that value as a Tweet) |
---|---|
Return type: | Tweet (or None, if the Tweet is neither a quote tweet or a Retweet) |
Raises: | NotATweetError – if embedded tweet is malformed |
favorite_count
¶The number of favorites that this tweet has received at the time of retrieval. If a tweet is obtained from a live stream, this will likely be 0.
Returns: | value returned by calling tweet_counts.get_favorite_count on self |
---|---|
Return type: | int |
follower_count
¶The number of followers that the author of the Tweet has
Returns: | the number of followers. value returned by calling get_follower_count on self |
---|---|
Return type: | int |
following_count
¶The number of accounts that the author of the Tweet is following
Returns: | the number of accounts that the author of the Tweet is following, value returned by calling get_following_count on self |
---|---|
Return type: | int |
generator
¶Get information about the application that generated the Tweet
Returns: | keys are ‘link’ and ‘name’, the link to and name of the application that generated the Tweet. value returned by calling tweet_generator.get_generator on self |
---|---|
Return type: | dict |
geo_coordinates
¶The user’s geo coordinates, if they are included in the payload (otherwise return None). Dictionary with the keys “latitude” and “longitude” or None
Returns: | value returned by calling tweet_geo.get_geo_coordinates on self |
---|---|
Return type: | dict |
gnip_matching_rules
¶Get the Gnip tagged rules that this tweet matched.
Returns: | List of potential tags with the matching rule or None if no rules are defined. |
---|
A list of hashtags in the Tweet. Note that in the case of a quote-tweet, this does not return the hashtags in the quoted status. The recommended way to get that list would be to use tweet.quoted_tweet.hashtags
Returns: | list of all of the hashtags in the Tweet value returned by calling tweet_entities.get_hashtags on self |
---|---|
Return type: | list (a list of strings) |
id
¶Tweet snowflake id as a string
Returns: | Twitter snowflake id, numeric only (no other text) |
---|---|
Return type: | str |
Example
>>> from tweet_parser.tweet import Tweet
>>> original_format_dict = {
... "created_at": "Wed May 24 20:17:19 +0000 2017",
... "id": 867474613139156993,
... "id_str": "867474613139156993",
... "user": {"user_keys":"user_data"},
... "text": "some tweet text"
... }
>>> Tweet(original_format_dict).id
'867474613139156993'
>>> activity_streams_dict = {
... "postedTime": "2017-05-24T20:17:19.000Z",
... "id": "tag:search.twitter.com,2005:867474613139156993",
... "actor": {"user_keys":"user_data"},
... "body": "some tweet text"
... }
>>> Tweet(activity_streams_dict).id
'867474613139156993'
in_reply_to_screen_name
¶The screen name of the user being replied to (None if the Tweet isn’t a reply)
Returns: | value returned by calling tweet_reply.get_in_reply_to_screen_name on self |
---|---|
Return type: | str |
in_reply_to_status_id
¶The status id of the Tweet being replied to (None if the Tweet isn’t a reply)
Returns: | value returned by calling tweet_reply.get_in_reply_to_status_id on self |
---|---|
Return type: | str |
in_reply_to_user_id
¶The user id of the user being replied to (None if the Tweet isn’t a reply). This raises a NotAvailableError for activity-streams format
Returns: | value returned by calling tweet_reply.get_in_reply_to_user_id on self |
---|---|
Return type: | str |
klout_id
¶(DEPRECATED): The Klout ID of the user (str) (if it exists)
Returns: | value returned by calling tweet_user.get_klout_id on self (if no Klout is present, this returns a None) |
---|---|
Return type: | str |
klout_influence_topics
¶(DEPRECATED): Get the user’s Klout influence topics (a list of dicts), if it exists. Topic dicts will have these keys: url, id, name, score
Returns: | value returned by calling tweet_user.get_klout_topics(self, topic_type = ‘influence’) (if no Klout is present, this returns a None) |
---|---|
Return type: | list |
klout_interest_topics
¶(DEPRECATED): Get the user’s Klout interest topics (a list of dicts), if it exists. Topic dicts will have these keys: url, id, name, score
Returns: | value returned by calling tweet_user.get_klout_topics(self, topic_type = ‘interest’) (if no Klout is present, this returns a None) |
---|---|
Return type: | list |
klout_profile
¶(DEPRECATED): The Klout profile URL of the user (str) (if it exists)
Returns: | value returned by calling tweet_user.get_klout_profile on self (if no Klout is present, this returns a None) |
---|---|
Return type: | str |
klout_score
¶(DEPRECATED): The Klout score (int) (if it exists) of the user who posted the Tweet
Returns: | value returned by calling tweet_user.get_klout_score on self (if no Klout is present, this returns a None) |
---|---|
Return type: | int |
lang
¶The language that the Tweet is written in.
Returns: | 2-letter BCP 47 language code (or None if undefined) Value returned by calling tweet_text.get_lang on self |
---|---|
Return type: | str |
media_urls
¶A list of all media (https) urls in the tweet, useful for grabbing photo/video urls for other purposes.
Returns: | list of all of the media urls in the Tweet value returned by calling tweet_entities.get_media_urls on self |
---|---|
Return type: | list (a list of strings) |
most_unrolled_urls
¶For each url included in the Tweet “urls”, get the most unrolled version available. Only return 1 url string per url in tweet.tweet_links In order of preference for “most unrolled” (keys from the dict at tweet.tweet_links):
Returns: | list of urls value returned by calling tweet_links.get_most_unrolled_urls on self |
---|---|
Return type: | list (a list of strings) |
name
¶The display name of the user who posted the Tweet
Returns: | value returned by calling tweet_user.get_name on self |
---|---|
Return type: | str |
poll_options
¶The text in the options of a poll as a list. If there is no poll in the Tweet, return an empty list. If activity-streams format, raise NotAvailableError
Returns: | value returned by calling tweet_text.get_poll_options on self |
---|---|
Return type: | list (list of strings) |
profile_location
¶User’s derived location data from the profile location enrichment If unavailable, returns None.
Returns: | value returned by calling tweet_geo.get_profile_location on self |
---|---|
Return type: | dict |
Example
>>> result = {"country": "US", # Two letter ISO-3166 country code
... "locality": "Boulder", # The locality location (~ city)
... "region": "Colorado", # The region location (~ state/province)
... "sub_region": "Boulder", # The sub-region location (~ county)
... "full_name": "Boulder, Colorado, US", # The full name (excluding sub-region)
... "geo": [40,-105] # lat/long value that coordinate that corresponds to
... # the lowest granularity location for where the user
... # who created the Tweet is from
... }
quote_count
¶The number of tweets that this tweet has been quoted in at the time of retrieval. If a tweet is obtained from a live stream, this will likely be 0. This raises a NotAvailableError for activity-streams format
Returns: | value returned by calling tweet_counts.get_quote_count on self or raises NotAvailableError |
---|---|
Return type: | int |
quote_or_rt_text
¶The quoted or retweeted text in a Tweet (this is not the text entered by the posting user)
Returns: | value returned by calling tweet_text.get_quote_or_rt_text on self |
---|---|
Return type: | str |
quoted_tweet
¶The quoted Tweet as a Tweet object If the Tweet is not a quote Tweet, return None If the quoted Tweet payload cannot be loaded as a Tweet, this will raise a “NotATweetError”
Returns: | A Tweet representing the quoted status (or None) (see tweet_embeds.get_quote_tweet, this is that value as a Tweet) |
---|---|
Return type: | Tweet |
Raises: | NotATweetError – if quoted tweet is malformed |
retweet_count
¶The number of times this tweet has been retweeted at the time of retrieval. If a tweet is obtained from a live stream, this will likely be 0.
Returns: | value returned by calling tweet_counts.get_retweet_count on self |
---|---|
Return type: | int |
retweeted_tweet
¶The retweeted Tweet as a Tweet object If the Tweet is not a Retweet, return None If the Retweet payload cannot be loaded as a Tweet, this will raise a NotATweetError
Returns: | A Tweet representing the retweeted status (or None) (see tweet_embeds.get_retweet, this is that value as a Tweet) |
---|---|
Return type: | Tweet |
Raises: | NotATweetError – if retweeted tweet is malformed |
screen_name
¶The screen name (@ handle) of the user who posted the Tweet
Returns: | value returned by calling tweet_user.get_screen_name on self |
---|---|
Return type: | str |
text
¶The contents of “text” (original format) or “body” (activity streams format)
Returns: | value returned by calling tweet_text.get_text on self |
---|---|
Return type: | str |
tweet_links
¶The links that are included in the Tweet as “urls” (if there are no links, this is an empty list) This includes links that are included in quoted or retweeted Tweets Returns unrolled or expanded_url information if it is available
Returns: | A list of dictionaries containing information about urls. Each dictionary entity can have these keys; without unwound url or expanded url Twitter data enrichments many of these fields will be missing. (value returned by calling tweet_links.get_tweet_links on self) |
---|---|
Return type: | list (list of dicts) |
Example
>>> result = [
... {
... # url that shows up in the tweet text
... 'display_url': "https://twitter.com/RobotPrinc...",
... # long (expanded) url
... 'expanded_url': "https://twitter.com/RobotPrincessFi",
... # characters where the display link is
... 'indices': [55, 88],
... 'unwound': {
... # description from the linked webpage
... 'description': "the Twitter profile of RobotPrincessFi",
... 'status': 200,
... # title of the webpage
... 'title': "the Twitter profile of RobotPrincessFi",
... # long (expanded) url}
... 'url': "https://twitter.com/RobotPrincessFi"},
... # the url that tweet directs to, often t.co
... 'url': "t.co/1234"}]
tweet_type
¶tweet, quote, and retweet)
Returns: | (“tweet”,”quote” or “retweet” only) value returned by calling tweet_text.get_tweet_type on self |
---|---|
Return type: | str |
Type: | The type of Tweet this is (3 options |
user_entered_text
¶The text that the posting user entered
tweet: untruncated (includes @-mention replies and long links) text of an original Tweet
quote tweet: untruncated poster-added content in a quote-tweet
retweet: empty string
Returns: | if tweet.tweet_type == “retweet”, returns an empty string else, returns the value of tweet_text.get_full_text(self) |
---|---|
Return type: | str |
user_id
¶The Twitter ID of the user who posted the Tweet
Returns: | value returned by calling tweet_user.get_user_id on self |
---|---|
Return type: | str |
user_mentions
¶The @-mentions in the Tweet as dictionaries. Note that in the case of a quote-tweet, this does not return the users mentioned in the quoted status. The recommended way to get that list would be to use ‘tweet.quoted_tweet.user_mentions’. Also note that in the caes of a quote-tweet, the list of @-mentioned users does not include the user who authored the original (quoted) Tweet, you can get the author of the quoted tweet using tweet.quoted_tweet.user_id
Returns: | 1 item per @ mention, value returned by calling tweet_entities.get_user_mentions on self |
---|---|
Return type: | list (list of dicts) |
Example
>>> result = {
... #characters where the @ mention appears
... "indices": [14,26],
... #id of @ mentioned user as a string
... "id_str": "2382763597",
... #screen_name of @ mentioned user
... "screen_name": "notFromShrek",
... #display name of @ mentioned user
... "name": "Fiona",
... #id of @ mentioned user as an int
... "id": 2382763597
... }
Validation and checking methods for Tweets.
Methods here are primarily used by other methods within this module but can be used for other validation code as well.
tweet_parser.tweet_checking.
check_tweet
(tweet, validation_checking=False)[source]¶Ensures a tweet is valid and determines the type of format for the tweet.
Parameters: |
|
---|
tweet_parser.tweet_checking.
get_all_keys
(tweet, parent_key='')[source]¶Takes a tweet object and recursively returns a list of all keys contained in this level and all nexstted levels of the tweet.
Parameters: |
|
---|---|
Returns: | list of all keys in nested dicts. |
Example
>>> import tweet_parser.tweet_checking as tc
>>> tweet = {"created_at": 124125125125, "text": "just setting up my twttr",
... "nested_field": {"nested_1": "field", "nested_2": "field2"}}
>>> tc.get_all_keys(tweet)
['created_at', 'text', 'nested_field nested_1', 'nested_field nested_2']
tweet_parser.tweet_checking.
is_original_format
(tweet)[source]¶Simple checker to flag the format of a tweet.
Parameters: | tweet (Tweet) – tweet in qustion |
---|---|
Returns: | Bool |
Example
>>> import tweet_parser.tweet_checking as tc
>>> tweet = {"created_at": 124125125125,
... "text": "just setting up my twttr",
... "nested_field": {"nested_1": "field", "nested_2": "field2"}}
>>> tc.is_original_format(tweet)
True
tweet_parser.tweet_checking.
key_validation_check
(tweet_keys_list, superset_keys, minset_keys)[source]¶Validates the keys present in a Tweet.
Parameters: |
|
---|---|
Returns: | 0 if no errors |
Raises: | UnexpectedFormatError on any mismatch of keys. – |