A Quick Guide to Twitter Developer API - Twitter API v2 - Get Tweets Data / Create Tweets with Python Library Tweepy

In order to use Twitter API, you need to have a developer account. Go to developer.twitter.com. Twitter API has 3 different account levels depending on your needs, you can check out what access level is suitable for you.

The 'Essential' level is for anyone who signs up for a developer account, once your email is verified, you can start a project immediately with an essential account. There is a tweet cap of 500k per month for this level. If this amount of access is not enough, you can apply for a use case review for a higher level 'Elevate', which can grant you 2 million tweets access per month. They both are free accounts, they both can have access to tweets within the past seven days.

Twitter also offers a higher level developer free account for academic research, qualified personnel are master or PhD students, professors and academic researchers etc. With an 'Academic Research' account, developers can have a full archive access since 2006 when twitter started, and up to 10 million of tweets per month.

I only created an Essential account for this project.

After creating a developer account, you can see the following three keys and tokens
1.Consumer API key
2.Consumer API key secret
3. Authentication bearer token
They are ready for you to have secured access to twitter API. These three credentials should be kept private and safe. They are good for fetching tweets from archive only, however, if what your need is to create tweets, retweet tweets, or like tweets, you should choose to allow special permissions to your application, such as 'read and write', and create your own authentication access token and secret.

Check out how twitter describe its Oauth 2.0 and Oauth 1.0a authentication methods:

OAuth 2.0 and OAuth 1.0a are authentication methods that allow users to sign in to your App with Twitter. They also allow your App to make specific requests on behalf of authenticated users. You can turn on one, or both methods.

Before generate Authentication Tokens - Access token and secret ( they are in Projects & Apps tab - Settings tab ), click 'Edit' button for 'User Authentication Settings'. All these are pages in the developer portal - dashboard page.

For my twitter project, I choose OAuth 1.0a to be turned on, and choose 'read and write' permission. Then put in a callback URL of your choice, I use https://example.com as my callback URL. After saving all the settings, we are ready to design the app.

Save all the Credentials:

To safeguard all the credentials from this twitter app project, I created a config.py file in the project folder, and saved each of the key and token as a constant variable. There are total 5 of them at this stage: API key, API key secret, Bearer token, Access token, Access token secret.

Install Tweepy Library:

As Tweepy explains, it is a Python library for accessing the Twitter API. I use this library to simplify the work of using Twitter API.

The first step of using Tweepy is to create a Client instance for authentication. For now only the bearer token is required as a parameter . Client class is a new addition since Tweepy 4.0. It is the interface for Twitter API v2 specifically. It handles OAuth 2.0 Bearer Token (application-only) and OAuth 1.0a User Context authentication for you. Tweepy's older version only had OauthHandler type of classes for authentication purpose. We can use a Client instance to create, delete tweets, like or dislike a tweet, search all tweets or search recent tweets, etc.

By using Tweepy's Client, we can receive data in batches through repeated requests to the Twitter API, as might be expected from a REST API. However, if you wish to a open a single connection between your app and the API, and have new results constantly being sent through that connection whenever new matches occur, you can utilize Tweepy's StreamingClient instead (Use Stream for Twitter API v1.1). Threading is preferred to further speed up the process.

For the purpose of this project though, I only use Client for requests. Let's try to search recent tweets. We need to define a query or a rule for the search.

import tweepy
import config

# retrieve bearer token:
bearer_token = config.Bearer_token
# print(bearer_token)

# create a tweepy client
client = tweepy.Client(bearer_token=bearer_token)

# # build a query : all tweets with dance keyword that are retweets and lanuage is English
query = 'dance lang:en is:retweet' 
# parameters options: 
# [query,start_time,end_time,since_id,until_id,max_results,next_token,pagination_token,
# sort_order,expansions,tweet.fields,media.fields,poll.fields,place.fields,user.fields]
client_response = client.search_recent_tweets(query=query, max_results =10)

# explore data field inside  response
data = client_response.data
ids = [d['id'] for d in data]
texts = [d['text'] for d in data]
print(ids)

# explore includes fields  inside response
includes = client_response.includes
print(includes.keys()) # output is dict_keys([])

We can see from the output that all the text of the tweets need to be further cleaned to remove empty lines and extra spaces.

The above search is a very basic search without using many of other parameters for search defination.

Use Expansion for Twitter Tweets' and User's Detailed Information

Search_recent_tweets() has an optional param called Expansion. This param is for expanding an object' ID by specifying a comma-separated list of nouns and fields within the expansions request parameter.

For example, if we choose to expand author_id and referenced_tweets.id, we can see from the following code, how the twitter API loads up the necessary data to the client.response instance accordingly. The 'users' and 'tweets' should be now included inside the 'includes' field =>

client_response = client.search_recent_tweets(query=query, 
    max_results =10, 
    expansions=["author_id", 'referenced_tweets.id','geo.place_id'])

# explore data field inside  response
data = client_response.data
ids = [d['id'] for d in data]
texts = [d['text'] for d in data]
print(ids)

# explore includes fields  inside response
includes = client_response.includes
print(includes.keys()) # output is dict_keys(['users', 'tweets'])

Using debugging tool of your IDE, we can clearly visualize how the data structure of the tweepy's Client.response class is hierarchally built.

Set up a breakpoint near the end of your application, then run debugging, we can see that 'includes' is a dictionary variable that has 'users' and 'tweets' as its keys.

Although 'geo.place_id' is added as one of 'expansions' choice, the 'includes' doesn't show 'places' key, this is because this project is only for "Essential account'. To view 'places' from Twitter API, we need to apply for 'Elevate' account.

Now, for specific Users or Tweets fields, we can use 'user_fields' and 'tweet_fields' to define. What are available fields to choose from, we can check out Twitter API doc page about 'fields'. For User, choose User, for Tweet, choose Tweet.

User has 3 default fields: id, name and username. I will add a ''profile_image_url' new field.

Tweet has 2 default fields: id and text. I will add a 'author_id' new field.

client_response = client.search_recent_tweets(
    query=query, 
    max_results =10, 
    expansions=["author_id", 'referenced_tweets.id','geo.place_id'],
    user_fields = ['profile_image_url'],
    tweet_fields = ['author_id']
    )

# explore data field inside  response
data = client_response.data
ids = [d['id'] for d in data]
texts = [d['text'] for d in data]
print(ids)

# explore includes fields  inside response
includes = client_response.includes
print(includes.keys()) # output is dict_keys(['users', 'tweets'])

for  user in client_response.includes['users'][:1]:
    print(user.data.keys()) 
#  output: dict_keys(['profile_image_url', 'username', 'name', 'id'])

for tw in includes['tweets'][:1]:
    print(tw.data.keys()) 
#output: dict_keys(['text', 'id', 'author_id'])

We add 'user_fields' param with 'profile_image_url' field. Now when we refresh the debugging, under the client_response -> includes -> Users -> data, except the 3 default fields of Users (id, name, username), now the new field 'profile_image_url' is also added into the 'data' dictionary. We can therefore retrieve this url thru python code.

Twitter API doc has a detailed description about each object and its fields available for use.

How to get more than 10 tweets?

One param of search recent tweets method is 'max_results'. Its default value is 10. If we put 100 as the value, we can get a 100-tweets result. We can also use tweepy's paginator class to get as many as we need tweets from the Twitter API. Check out the code below.

import tweepy
import config

# retrieve bearer token:
bearer_token = config.Bearer_token
# print(bearer_token)

# create a tweepy client
client = tweepy.Client(bearer_token=bearer_token)

# # build a query : all tweets with dance keyword that are retweets and lanuage is English
query = 'dance lang:en is:retweet' 

# USE pagination
# create tweets generator
results = tweepy.Paginator(client.search_recent_tweets, query = query,max_results= 100).flatten(limit=1000)
with open('tweet_id.txt', 'a+') as file:
    for tw in results:
        file.write(str(tw.id )+ '\n')

Thus, a text file is created with 1000 tweets ids stored.

How to get tweets volume counts for certain query

For a trending word such as 'covid', we can see how its daily volume changed within the past 7 days.

#  build a query : tweets with 'covid' context notation exclude retweet
query = 'covid -is:retweet'

# count tweets volume about a query, granularity is optional
results = client.get_recent_tweets_count(query=query, granularity='day')

# from debugging, the results has a len() of 4, 
# its [3] is a dict with a key of 'total_tweet_count'
total = results[3]['total_tweet_count']
# print(total) # output:1375910

# only show its 'data' field first item:
for re in results.data[:1]:
    print(re.keys()) 
    # output: dict_keys(['end', 'start', 'tweet_count'])
    
    print(re['tweet_count']) 
    # output: 737 or 51227 with granularity defination
    print(f'{re["start"]} ~ {re["end"]}') 
    # output : 2022-05-12T18:57:49.000Z ~ 2022-05-12T19:00:00.000Z
    # or 2022-05-12T19:05:11.000Z ~ 2022-05-13T00:00:00.000Z w/granularity (7days)

How to get a user's timeline

Elon Musk has been on the news constantly these days, so I would like to check on his timeline for the past 7 days.

# get timeline of a user with its username
user = client.get_user(username='elonmusk')
# user=> Response(data=<User id=44196397 
# name=Elon Musk username=elonmusk>, includes={}, errors=[], meta={})

# we need the id to get all the tweets
user_id = user.data['id'] # 44196397

# we can still use the same params used before for detailed tweet or user info
tweets = client.get_users_tweets(user_id,max_results=100,  tweet_fields=['possibly_sensitive'])
print(len(tweets.data)) # 21 tweets within last 7 days by Elon Musk

# print out each tweet id and sensitivity
for tw in tweets.data:
    print(tw.id)
    print(tw.possibly_sensitive)

NOW, let's use tweepy and Twitter API to remotely create new tweets onto Twitter. A new Client instance is needed with four credentials provided including the access token and secret. A new Response is generated using create_tweet() methos as an example.

# for create tweets, like tweets etc, create a new client 
client_1 = tweepy.Client(consumer_key=config.API_key, 
    consumer_secret=config.API_key_secret,
    access_token=config.Access_token,
    access_token_secret=config.Access_token_secret )

# create a tweet
response = client_1.create_tweet(
    text='Testing using tweepy and twitter API')

print(response) # a new tweet has been added to my own timeline on Twitter
# output: Response(
#   data={'id': '1527391151812005901', 
#           'text': 'Testing using tweepy and twitter API'}, 
#           includes={}, 
#           errors=[], 
#            meta={})

All the above interactions with Twitter API can be done with the most basic "Essential" account. That's pretty cool already. There are a lot more that can be done with this API. This is just the one tip of the iceberg.

Search This Blog

To Optimize Life Algorithms

Featured

Steps to Create a Project on GitHub