The world of social networks could be considered, today, one of the largest free data source available in the market. When you think about Big Data, probably the first example that comes to your mind is Twitter.

Like many other social networks, Twitter allows its users to post, comment, like and follow, to express their opinions and receive those of other users at the same time.

Now, imagine if you are about to offer the market a new product and before starting the whole campaign (which can be highly expensive), you want to analysis which is the sentiment, the “mood” of your country, or a continent, or the whole world, about this product. By filtering all the tweets and retweets of your users’ sample with some keywords related to your product, you might discover that your business strategy is not the best. Hence, you might re-direct your analysis and test whether some changes could improve your idea, gathering new information and slightly modify your filters.

And you can do anything I mentioned above in streaming, that means, in real time. With Tweepy, the library implemented in Python which allows you to get connected with the social, you can keep track of tweets you are interested in. Furthermore, you can use any filter you want, from languages to keywords, from the number of followers to the amount of likes.

With this article, I want to provide a real-time example of the tracking procedure. For this purpose, I will use Jupyter Notebook and I will set a very simple business case for my analysis. Let’s image you are about to open a new Italian Pizza restaurant in San Francisco, but before proceeding with huge capital investments, you first want to test the sentiment of the market. Of course, your target will be the population of San Francisco and some keywords you might use will be related to the kind of menu you want to offer.

So let’s start by importing all the libraries we will need:

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json
from textblob import TextBlob
import matplotlib.pyplot as plt
import tweepy
import re 

Now, you have to create your Twitter class and set your Twitter profile details. Note that these details will be available only if you create a developer profile (you can easily create it here).

class TwitterClient(object): 
     
    
    
    def __init__(self): 
        
        
        consumer_key = 'xxxxxxx'
        consumer_secret = 'xxxxxxx'
        access_token = 'xxxxxxx'
        access_token_secret = 'xxxxxxx'


         
        try: 
            
            self.auth = OAuthHandler(consumer_key, consumer_secret) 
            self.auth.set_access_token(access_token, access_token_secret)
            self.api = tweepy.API(self.auth) 
        except: 
            print("Error: Authentication Failed") 


Now, below my Twitter object, I will define some functions which I’m going to explain:

def clean_tweet(self, tweet): 
       
        return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split()) 


def get_tweet_polarity(self, tweet):
        
    
        analysis = TextBlob(self.clean_tweet(tweet))
        return analysis.sentiment.polarity    




def get_tweet_sentiment(self, tweet): 
        
        
        analysis = TextBlob(self.clean_tweet(tweet)) 
        if analysis.sentiment.polarity > 0: 
            return 'positive'
        elif analysis.sentiment.polarity == 0: 
            return 'neutral'
        else: 
            return 'negative'
        

def get_tweets(self, query, count = 10): 


        tweets = [] 


        try: 
            fetched_tweets = self.api.search(q = query, count = count) 


          
            for tweet in fetched_tweets: 
                parsed_tweet = {} 


                parsed_tweet['text'] = tweet.text 
                parsed_tweet['sentiment'] = self.get_tweet_sentiment(tweet.text)
                parsed_tweet['polarity'] = self.get_tweet_polarity(tweet.text)


                if tweet.retweet_count > 0: 
                    else: 
                        tweets.append(parsed_tweet) 


            return tweets 


        except tweepy.TweepError as e:
            print("Error : " + str(e))
            print("Error : " + str(e)) 

Let’s examine those functions one by one:

  • clean_tweet: it is a utility function which eliminates links from tweets’ texts;
  • get_tweet_polarity: to get our tweets’ polarity (that is a number between -1 and 1, where -1 indicates very negative sentiment, whereas 1 is the best sentiment you can catch from tweets) we will use TextBlob. It is a Python library used for processing textual data;
  • get_tweet_sentiment: now we want to convert polarity into three categories: positive (polarity>0), negative (polarity<0) and neutral (polarity=0);
  • get_tweets: this function will get and append to our Tweets[] list all the tweets corresponding to our query (which we will set in our next function). In that list, three attributes of the fetched tweets will be saved: text, sentiment and polarity.

Now I will define my main function. This function will first set the query (‘Pizza San Francisco’) and then split the fetched tweets into positive (ptweets), negative (ntweets) and neutral.

Then, I will plot a pie chart to show the percentages of positive, neutral and negative tweets, as well as plotting a line graph with the path of the polarity about my idea.

Finally, I’m asking to display the text of the first fetched tweets. I’m not going to display those latter, but I will examine some information which might be useful for my task.

def main(): 
    api = TwitterClient() 
    tweets = api.get_tweets(query = 'Pizza San Francisco', count = 200) 


    ptweets =  == 'positive'] 
    print("Positive tweets percentage: {} %".format(100*len(ptweets)/len(tweets))) 
    ntweets =  == 'negative'] 
    print("Negative tweets percentage: {} %".format(100*len(ntweets)/len(tweets))) 
    print("Neutral tweets percentage: {} %".format(100*(len(tweets) - len(ntweets) - len(ptweets))/len(tweets))) 


#plotting my pie chart with percentage of positive, negative and neutral tweets
    labels = ['positive','negative','neutral']
    sizes = [(100*len(ptweets)/len(tweets)),(100*len(ntweets)/len(tweets)),(100*(len(tweets) - len(ntweets) - len(ptweets))/len(tweets))]
    colors = ['yellowgreen', 'gold', 'lightskyblue']
    patches, texts = plt.pie(sizes, colors=colors, shadow=True, startangle=90)
    plt.legend(patches, labels, loc="best")
    plt.axis('equal')
    plt.tight_layout()
    plt.show()
    
    
#plotting my line graph with polarity's path
    x= for tweet in tweets]
    plt.plot(x)
    
    
    
#printing the first positive and negative tweets
    print("\n\nPositive tweets:") 
    for tweet in ptweets[:10]: 
        print(tweet['text']) 



    print("\n\nNegative tweets:") 
    for tweet in ntweets[:10]: 
        print(tweet['text']) 


if __name__ == "__main__":  
Non è stato fornito nessun testo alternativo per questa immagine
Non è stato fornito nessun testo alternativo per questa immagine

As you can see, there are not so many “extreme” sentiments right now about pizza in San Francisco, since almost 50% of the population is neutral. However, among the positive tweets displayed I found the following:

“Fantastic Chorizo and Chicken Pizza from @WipeoutBar at Fisherman’s Wharf in San Francisco. Great food and great service if you’re in the #FishermansWharf area.”

Again, this is a very easy task and far from being realistic, but with just this simple and poorly filtered information I can elaborate some interesting strategies. First, I know that there is already a Pizza restaurant in Fisherman’s Wharf area called WipeoutBar. Hence, I can start googling this place and see the kind of menu it offers, its prices, atmosphere and so forth. I might decide to target the same area with a very different kind of pizza, so that my clients will be different from those of WipeoutBar. Alternatively, I might decide to change completely my area, targeting at some zones lacking Pizza restaurants.

If you think about the accuracy of the filtering procedure you can implement and the amount of information you could gather, you can understand why such an analysis might be pivotal for your business.

Advertisements

Published by valentinaalto

I'm a 22-years-old student based in Milan, passionate about everything related to Statistics, Data Science and Machine Learning. I'm eager to learn new concepts and techniques as well as share them with whoever is interested in the topic.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: