Module containing the various functions that are used for API calls, rule generation, and related.
searchtweets.api_utils.gen_rule_payload(pt_rule, results_per_call=None, from_date=None, to_date=None, count_bucket=None, tag=None, stringify=True)[source]¶Generates the dict or json payload for a PowerTrack rule.
| Parameters: |
|
|---|
Example
>>> from searchtweets.utils import gen_rule_payload
>>> gen_rule_payload("beyonce has:geo",
... from_date="2017-08-21",
... to_date="2017-08-22")
'{"query":"beyonce has:geo","maxResults":100,"toDate":"201708220000","fromDate":"201708210000"}'
searchtweets.api_utils.gen_params_from_config(config_dict)[source]¶Generates parameters for a ResultStream from a dictionary.
searchtweets.api_utils.infer_endpoint(rule_payload)[source]¶Infer which endpoint should be used for a given rule payload.
searchtweets.api_utils.convert_utc_time(datetime_str)[source]¶Handles datetime argument conversion to the GNIP API format, which is YYYYMMDDHHSS. Flexible passing of date formats in the following types:
- YYYYmmDDHHMM
- YYYY-mm-DD
- YYYY-mm-DD HH:MM
- YYYY-mm-DDTHH:MM
| Parameters: | datetime_str (str) – valid formats are listed above. |
|---|---|
| Returns: | string of GNIP API formatted date. |
Example
>>> from searchtweets.utils import convert_utc_time
>>> convert_utc_time("201708020000")
'201708020000'
>>> convert_utc_time("2017-08-02")
'201708020000'
>>> convert_utc_time("2017-08-02 00:00")
'201708020000'
>>> convert_utc_time("2017-08-02T00:00")
'201708020000'
searchtweets.api_utils.validate_count_api(rule_payload, endpoint)[source]¶Ensures that the counts api is set correctly in a payload.
searchtweets.api_utils.change_to_count_endpoint(endpoint)[source]¶Utility function to change a normal endpoint to a count api
endpoint. Returns the same endpoint if it’s already a valid count endpoint.
:param endpoint: your api endpoint
:type endpoint: str
| Returns: | the modified endpoint for a count endpoint. |
|---|---|
| Return type: | str |
This module contains the request handing and actual API wrapping functionality.
Its core method is the ResultStream object, which takes the API call
arguments and returns a stream of results to the user.
searchtweets.result_stream.ResultStream(endpoint, rule_payload, username=None, password=None, bearer_token=None, extra_headers_dict=None, max_results=500, tweetify=True, max_requests=None, **kwargs)[source]¶Bases: object
Class to represent an API query that handles two major functionality pieces: wrapping metadata around a specific API call and automatic pagination of results.
| Parameters: |
|
|---|
Example
>>> rs = ResultStream(**search_args, rule_payload=rule, max_pages=1)
>>> results = list(rs.stream())
execute_request()[source]¶Sends the request to the API and parses the json response. Makes some assumptions about the session length and sets the presence of a “next” token.
session_request_counter = 0¶stream()[source]¶Main entry point for the data from the API. Will automatically paginate
through the results via the next token and return up to max_results
tweets or up to max_requests API calls, whichever is lower.
>>> result_stream = ResultStream(**kwargs)
>>> stream = result_stream.stream()
>>> results = list(stream)
>>> # or for faster usage...
>>> results = list(ResultStream(**kwargs).stream())
searchtweets.result_stream.collect_results(rule, max_results=500, result_stream_args=None)[source]¶Utility function to quickly get a list of tweets from a ResultStream
without keeping the object around. Requires your args to be configured
prior to using.
| Parameters: |
|
|---|---|
| Returns: | list of results |
Example
>>> from searchtweets import collect_results
>>> tweets = collect_results(rule,
max_results=500,
result_stream_args=search_args)
Utility functions that are used in various parts of the program.
searchtweets.utils.take(n, iterable)[source]¶Return first n items of the iterable as a list. Originally found in the Python itertools documentation.
| Parameters: |
|
|---|
searchtweets.utils.partition(iterable, chunk_size, pad_none=False)[source]¶adapted from Toolz. Breaks an iterable into n iterables up to the certain chunk size, padding with Nones if availble.
Example
>>> from searchtweets.utils import partition
>>> iter_ = range(10)
>>> list(partition(iter_, 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
>>> list(partition(iter_, 3, pad_none=True))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
searchtweets.utils.merge_dicts(*dicts)[source]¶Helpful function to merge / combine dictionaries and return a new dictionary.
| Parameters: | dicts (list or Iterable) – iterable set of dictionaries for merging. |
|---|---|
| Returns: | dict with all keys from the passed list. Later dictionaries in the sequence will override duplicate keys from previous dictionaries. |
| Return type: | dict |
Example
>>> from searchtweets.utils import merge_dicts
>>> d1 = {"rule": "something has:geo"}
>>> d2 = {"maxResults": 1000}
>>> merge_dicts(*[d1, d2])
{"maxResults": 1000, "rule": "something has:geo"}
searchtweets.utils.write_result_stream(result_stream, filename_prefix=None, results_per_file=None, **kwargs)[source]¶Wraps a ResultStream object to save it to a file. This function will still
return all data from the result stream as a generator that wraps the
write_ndjson method.
| Parameters: |
|
|---|
searchtweets.utils.read_config(filename)[source]¶Reads and flattens a configuration file into a single
dictionary for ease of use. Works with both .config and
.yaml files. Files should look like this:
search_rules:
from-date: 2017-06-01
to-date: 2017-09-01 01:01
pt-rule: kanye
search_params:
results-per-call: 500
max-results: 500
output_params:
save_file: True
filename_prefix: kanye
results_per_file: 10000000
or:
[search_rules]
from_date = 2017-06-01
to_date = 2017-09-01
pt_rule = beyonce has:geo
[search_params]
results_per_call = 500
max_results = 500
[output_params]
save_file = True
filename_prefix = beyonce
results_per_file = 10000000
| Parameters: | filename (str) – location of file with extension (‘.config’ or ‘.yaml’) |
|---|---|
| Returns: | parsed configuration dictionary. |
| Return type: | dict |