What Donald Trump is Tweeting (Analyzing Tweets with NLTK and Pandas)

How does @realDonaldTrump (Donald J. Trump) tweet? Before the President-elect became the president-elect, I didn’t pay much attention to his tweets, but I did know that he seemed to have a unique style of writing them. To me, it felt like he tweeted like he spoke in public. But what was that tone or Trump brand?

From a quick glance of his account, the usage of exclamation points seemed prominent. I wondered if that was consistent, and I wondered what else I could find. So I ran an analysis Trump’s tweets as well as @HillaryClinton, @CNN and @FoxNews’ for some comparison.

My strategy for selecting the accounts:

  1. Find another individual user’s account with opposing views, but who had some similar goals over the last year + (to be come President of the United States)
  2. Compare the individual users’ accounts with “objective” news sources’ Twitter accounts. Since source objectivity is always a topic of hot debate (or a hot topic of debate), I took @CNN and @FoxNews.

The Method

  • Pull as many of Trump’s Tweets as I could via Twitter API. The best service to perform this operation is /user_timeline. You can only get the last 3,200 tweets from any particular user handle, which is unfortunate because plenty of accounts have authored far more than 3,200 tweets (e.g. Trump has written over 30,000 tweets).
  • Take the tweets and do some Natural Language Processing with Python’s NLTK.
    • Get the parts of speech tags for every word in every tweet
    • Do word frequency counts
    • Classify the sentiment of the tweets
  • Classify the reading level or difficulty of each word. Determine the reading level of the account. 
    • I didn’t implement this. I started, but then I got distracted.
  • Pull out an aggregate view of some other interesting tweet data, like #HASHTAG usage.
  • Compare data.

And Now, The Data

Punctuation! Punctuation! Punctuation! Trumpunctuation?

If you’ve seen Trump speak, you probably may have noticed his strong intonations and general emphatic demeanor. His tweets seem to capture this partially through punctuation alone. His words are sprinkled with the strongest punctuation mark in the English language, the exclamation mark.

“!” occurs 2336 times over 3200 Trump tweets. 

And “!” occurs at least once 1954 times over 3200 Trump tweets.

Which means that 61% of Trump’s tweets contain an exclamation mark (based on my 3200 tweet dataset). This seemed astonishingly high; when I first pulled this I thought my code was incorrect. So I opened my file of tweets and eyeballed it to confirm. I figured if 60+ % of tweets contained an exclamation mark, then it would be easy to confirm my sanity (or lack thereof) from the text file.

trump's usage of exclamation mark

Some of Trump’s Tweets

The highest frequency of “!” in a single tweet was five. And that occurred in the following tweet:

“#WheresHillary? Sleeping!!!!!”
Which was Retweeted 27,158 times and Favorited 61,084 times. Created on August 20, 2016.


Here’s the !!!! comparison among the entire group:

@realDonaldTrump Absolute Count (!) @HillaryClinton Absolute Count (!) @CNN Absolute Count (!) @FoxNews Absolute Count (!)
2336 171 43 134

Adjectives Are Very Great

Trump’s top twenty favorite adjectives are listed below, along with the top 20 adjectives from the comparison group. It’s not really a surprise that “great” was number one for Trump because, among other things, his campaign’s slogan was “Make America Great Again.” Notably, I removed stopwords and only grabbed lowercase words to help filter out some noise from the data. Relative frequency is the percentage relative to the other adjectives in the dataset.

trump_word abs_frequency rel_frequency
great 211 4.953052
bad 81 1.901408
many 81 1.901408
big 77 1.807512
last 66 1.549296
new 56 1.314554
good 50 1.173709
much 34 0.798122
amazing 32 0.751174
total 30 0.704225
wonderful 30 0.704225
nice 27 0.633803
massive 25 0.586854
first 25 0.586854
presidential 23 0.539906


Below is the list of top 20 adjectives from @realDonaldTrump, @HillaryClinton, @CNN, @FoxNews from their last 3200 tweets.



General Sentiment of Tweets

To calculate the sentiment of the tweets, I used a function in the nltk library called demo_liu_hu_lexicon() which classifies each word of the sentence as Positive, Negative, or Neutral, and then does a basic count of each word-classification category. Whichever group has the highest count is how the text will get assigned. There are definitely better ways to do this. I considered an integration with IBM’s Watson, but time was doing its thing, being time, and being of the essence and such.

@realDonaldTrump @HillaryClinton @CNN @FoxNews
Positive 0.463750 0.468750 0.304688 0.235625
Neutral 0.285313 0.401875 0.425938 0.455937
Negative 0.250937 0.129375 0.269375 0.308437

Top Hashtags

This section contains a top 5 hashtag summary table for the entire analysis group, and also has the top 20 hashtags for each account listed afterward. You can infer what you will from this data.

I did find it interesting that the top-used hashtag by Trump was one of self-promotion, and Hillary Clinton used a lot of hashtags relating to the debates. Just looking at the hashtag data makes me think that Trump’s social media strategy was much stronger throughout the campaign.

In addition, all four Twitter accounts had at least one hashtag with “Trump” in it. From a marketing perspective, that’s good brand awareness.

Top 5 Hashtag Summaries

@realDonaldTrump Top 5 Hashtags @HillaryClinton Top 5 Hashtags @CNN Top 5 Hashtags @FoxNews Top 5 Hashtags
#Trump2016, #MakeAmericaGreatAgain, #MAGA, #AmericaFirst, #DrainTheSwamp #DebateNight, #DemsInPhilly, #VPDebate, #RNCinCLE, #debate #CNNHeroes, #CNNSOTU, #Aleppo, #CNNNYE, #JadonAndAnias #KellyFile, #Trump, #FoxNews2017, #Hannity, #Christmas

@realDonaldTrump Hashtags

abs_frequency hashtag rel_frequency_pct
238 Trump2016 15.130324
202 MakeAmericaGreatAgain 12.841704
111 MAGA 7.056580
79 AmericaFirst 5.022250
78 DrainTheSwamp 4.958678
57 ImWithYou 3.623649
57 BigLeagueTruth 3.623649
53 VoteTrump 3.369358
38 CrookedHillary 2.415766
36 Debate 2.288620
35 TrumpTrain 2.225048
34 TrumpPence16 2.161475
24 Debates2016 1.525747
22 ICYMI 1.398601
20 SuperTuesday 1.271456
18 VPDebate 1.144310
15 RNCinCLE 0.953592
14 Debates 0.890019
13 WIPrimary 0.826446
12 ThankYouTour2016 0.762873

Below are the top hashtags for the entire group.

It’s also worth sharing how many total hashtags were used in the last 3200 tweets of each Twitter account in the dataset group:

@realDonaldTrump @HillaryClinton @CNN @FoxNews
total_hashtags_used 1573 482 143 810


Top Favorited Tweets

Top Favorited of @realDonaldTrump

text favorite_count created_at
Such a beautiful and important evening! The forgotten man and woman will never be forgotten again. We will all come together as never before 639428 Wed Nov 09 11:36:58 +0000 2016
TODAY WE MAKE AMERICA GREAT AGAIN! 577008 Tue Nov 08 11:43:14 +0000 2016
How long did it take your staff of 823 people to think that up–and where are your 33,000 emails that you deleted? 296236 Thu Jun 09 20:40:32 +0000 2016
The media is spending more time doing a forensic analysis of Melania\’s speech than the FBI spent on Hillary\’s emails. 248370 Wed Jul 20 15:36:06 +0000 2016
Just had a very open and successful presidential election. Now professional protesters, incited by the media, are protesting. Very unfair! 234619 Fri Nov 11 02:19:44 +0000 2016
Love the fact that the small groups of protesters last night have passion for our great country. We will all come together and be proud! 224497 Fri Nov 11 11:14:20 +0000 2016
Nobody should be allowed to burn the American flag – if they do, there must be consequences – perhaps loss of citizenship or year in jail! 216064 Tue Nov 29 11:55:13 +0000 2016
Fidel Castro is dead! 212487 Sat Nov 26 13:08:11 +0000 2016
This will prove to be a great time in the lives of ALL Americans. We will unite and we will win, win, win! 204239 Sat Nov 12 15:05:33 +0000 2016
A fantastic day in D.C. Met with President Obama for first time. Really good meeting, great chemistry. Melania liked Mrs. O a lot! 194927 Fri Nov 11 02:10:46 +0000 2016


Top Favorited of @HillaryClinton

text favorite_count created_at
“I never said that.” \u2014Donald Trump, who said that. #debatenight 160098 Tue Sep 27 01:19:47 +0000 2016
Where was this kind of comedy last night? 134956 Fri Oct 21 17:51:34 +0000 2016
“Trump just criticized me for preparing for this debate. You know what else I prepared for? Being president.” #DebateNight 112647 Tue Sep 27 02:01:59 +0000 2016
Women have the power to stop Trump.\\n\\n\\n 111546 Fri Oct 07 23:54:35 +0000 2016
“Nobody respects women more than me.” \u2014Donald Trump earlier tonight\\n\\n”Such a nasty woman.” \u2014Donald Trump just now #DebateNight 108114 Thu Oct 20 02:36:56 +0000 2016
Don\’t stand still. Vote today: #ElectionDay #MannequinChallenge 95882 Tue Nov 08 11:47:18 +0000 2016
Happy birthday to this future president. 94281 Wed Oct 26 13:03:18 +0000 2016
RT if you\u2019re proud of Hillary tonight. #DebateNight #SheWon 93121 Thu Oct 20 02:37:43 +0000 2016
RT this if you\’re proud to be standing with Hillary tonight. #debatenight 92721 Tue Sep 27 02:45:44 +0000 2016
This is horrific. We cannot allow this man to become president. 91471 Fri Oct 07 20:55:13 +0000 2016


Top Favorited of @CNN

text favorite_count created_at
At least 60 people have been hurt in an explosion at a fireworks market near Mexico City, local media report.\u2026 22570 Tue Dec 20 22:49:24 +0000 2016
This little boy is the newest face of OshKosh B\’gosh\’s holiday ads after initially being turned down by a talent ag\u2026 22506 Sat Dec 10 07:01:29 +0000 2016
“We\’ve been friends for a long time”: Kanye West and President-elect Trump appear together at Trump Tower\u2026 19082 Tue Dec 13 14:59:55 +0000 2016
This little boy is the newest face of OshKosh B\’gosh\’s holiday ads after initially being turned down by a talent ag\u2026 12099 Sat Dec 10 23:45:25 +0000 2016
Clinton jokes at Reid portrait unveiling: “After a few weeks of taking selfies in the woods, I thought it would be\u2026 11868 Thu Dec 08 22:23:56 +0000 2016
BREAKING: President Obama vows retaliatory action against Russia for its meddling in the US presidential election\u2026 9767 Fri Dec 16 01:36:31 +0000 2016
More Americans voted for Hillary Clinton than any other losing presidential candidate in US history 9273 Wed Dec 21 23:28:00 +0000 2016
This little boy is the newest face of OshKosh B\’gosh\’s holiday ads after initially being turned down by a talent ag\u2026 9117 Mon Dec 12 00:42:57 +0000 2016
Mariah Carey, Adele, Elton John and Lady Gaga bring the holiday spirit in special Christmas-themed \’Carpool Karaoke\u2026 7475 Mon Dec 19 05:30:24 +0000 2016
\’Dear world, why are you silent?\’: Desperate pleas from inside Aleppo 7355 Wed Dec 14 19:00:04 +0000 2016


Top Favorited of @FoxNews

text favorite_count created_at
JUST IN: President-elect #Trump announces @Sprint will bring 5,000 jobs back to the U.S., and OneWeb will hire 3,00\u2026 18974 Wed Dec 28 22:17:35 +0000 2016
.@JudgeJeanine: “Michelle, you may not realize it, but Americans rejected you and everything you stand for.”\u2026 15054 Sun Dec 18 17:17:51 +0000 2016
#Breaking News: President-elect @realDonaldTrump has garnered the 270 #ElectoralCollege votes needed to become pres\u2026 11164 Mon Dec 19 22:35:53 +0000 2016
Peters: “Without the least exaggeration, we can say that President Obama has been the worst foreign policy presiden\u2026 10263 Sat Dec 31 03:32:46 +0000 2016
.@realDonaldTrump: “Michelle Obama said yesterday that there\’s no hope, but I assume she was talking about the past\u2026 10021 Sat Dec 17 22:43:31 +0000 2016
.@realDonaldTrump: “We have to protect Israel. Israel, to me, is very, very important. We have to protect Israel.” 10021 Sun Jan 01 03:26:17 +0000 2017
Giuliani: \u201cThe U.S. Constitution doesn\u2019t give anyone in this world the right to come to the U.S. That\u2019s a privilege\u2026 8885 Wed Dec 21 03:41:24 +0000 2016
.@KatrinaPierson: This president is the divider-in-chief. His entire political career revolved around racism, sexis\u2026 8700 Fri Dec 30 01:43:04 +0000 2016
.@GovMikeHuckabee: Can you name me one Muslim country that welcomes Christians to build & protect churches? No, you\u2026 8419 Thu Dec 29 01:39:42 +0000 2016
DJT: “I think the Democrats are putting it out because they suffered 1 of the greatest defeats in the history of po\u2026 8175 Sun Dec 11 19:08:47 +0000 2016

About the Data

If you would like to access some of the code/data, it is publicly available on my GitHub repo. I’ve also included all four files that contain all of the tweets on which I ran the analysis in the data/ directory.

I had plans to include Date/Time data analysis in this post and many other things (if you’d like to see more data, let me know), but you have to stop somewhere great!!!!!! #MakeCodingProjectsSmallAgain


How to Retrieve and Analyze Your iOS Messages with Python, Pandas and NLTK

I’m one of those people that keeps every text message I send or receive — I never delete them. Meet a girl at a bar, text her the next day and never hear back from her? I keep that. Weird wrong-number texts? I keep those too. Ex-girlfriend texts? Definitely keepers.

I had 65,378 messages on my phone at the time of writing this post.

I’m not a digital hoarder or anything, but I primarily do this because I like the idea of being able to search through the past. But, digital hoarder or not, collecting anything takes up some sort of space, and when I found that my text messages were taking up 4GBs of space on my phone, I decided it was time to back them up. It was at that point that I realized I could also probably analyze them.

As it turns out, you can do this, and I’ll tell you how. For this project, I used Python/Pandas/NLTK for the analysis and an iPython Notebook to render the datasets. I’ve also uploaded the code to GitHub, which you can view here.

An overview of the steps to make this happen:

  1. Sync/back up your iPhone because the messages need to be stored on your computer.
  2. Load the SQLite file and retrieve all messages
    • You can follow the directions for retrieving the right file here.
  3. Analyze those mensajes (I used Pandas)!!

Let’s get into some details.

You need to sync and back up your phone’s contents to your computer. There’s a great post on how to do this here. In case you want to skip that read, you’re ultimately getting a file with the text messages in it; copying it and moving it into your working directory.

You can find the file with this bash command:

$ find / -name 3d0d7e5fb2ce288813306e4d4636395e047a3d28

Now, loading the SQLite file — you can actually see what’s in this file via the command line:

 $ sqlite3 3d0d7e5fb2ce288813306e4d4636395e047a3d28 

Then you can check out the available tables:

sqlite> .tables
_SqliteDatabaseProperties chat_message_join
attachment handle
chat message
chat_handle_join message_attachment_join

From here, the main tables I found useful were “message” and “handle.” The former contains all of your text messages, and the latter contains all of the senders/recipients. I only wrote code around the messages table, primarily because I could never figure out how to make a join between message and handle, but that was probably something trivial that I overlooked. Please tell me how you did it, if you did!

Continuing on, the message table has lots of columns in it, and I chose to select from the following:

['guid', 'service', 'text', 'date', 'date_delivered', 
'handle_id', 'type', 'is_read','is_sent', 'is_delivered',
'item_type', 'group_title']

The key field is “text,” which is where the content of the message is stored, which includes emojis! (A cool thing is that your emojis will show up if you try to plot them in something like an iPython notebook. You could run an entire analysis on emoji usage…)

My analysis, however, ultimately breaks down into two pieces:

  1. Analyzing the content of the “text” field (excluding emojis).
  2. Analyzing the messages themselves (for example, total text messages, or, what I sent vs. what I received, for instance).

For #1, I wrote code that:

  • Classifies all words and assigns a part of speech to them, then check the counts of each part of speech.
    • You should get a table looking like this.

      You should get a table looking like this.

  • Counts the number of times each word appears in the dataset, and gives an overview of the dataset:
    • total_words_filtered
  • Excludes boring words, like prepositions, and words that are < 2 characters.
  • Classifies all words as is_bad=1 or 0. I did this by using a .txt file full of bad words, found here:
  • Plots usage of bad words
    • I’d love to show you my plot, but let’s just assume I never swear…

For #2, the code allows you to:

  • Plot the number of text messages received each day (check out the spike on your birthday or during holidays). You can see my data below has a huge gap (that’s when my phone was replaced and not backed up for many months. My timestamp conversions are also apparently incorrect, but I haven’t looked into it.
    • The timestamp conversion is off, so someone can fix that... we're not in 2016, yet... Are we??

      The timestamp conversion is off, so someone can fix that… we’re not in 2016, yet…Or am I??

  • Count the number of sent versus received messages.

Anyway, I hope you can get some use out of this, and instead of blabbing on about the code here, I’ll just let you read it and use it on your own. Please check out my git repo, and please reach out to me with questions, comments, etc.

Coding, How-To

How to Create Geo HeatMaps with Pandas Dataframes and Google Maps JavaScript API V3

Get excited because we’re going to make a heatmap with Python Pandas and Google Maps JavaScript API V3. I’m assuming the audience has plenty of previous knowledge in Python, Pandas, and some HTML/CSS/JavaScript. Let’s begin with the DataFrame.

The DataFrame

First, you’re going to need a dataframe of “addresses” (can be a physical address, or even just a country name, like USA) that you eventually want to plot. (For the sake of simplicity, I’ll try to refer to the “address” as the “geo” for the rest of this document.) Second, since you are planning on using a heatmap, you’re going to want some sort of number that represents the weighted value of that row in comparison to other rows.

Let’s say your DataFrame looked like this:

grouped_country_df = main_df.groupby('country')\
                            .agg({'pink_kitten': lambda x: len(x.unique())})\
                            .sort('pink_kitten', ascending=False)
print grouped_country_df
geo_name count_of_pink_kittens
USA 3430
Spain 577
United Kingdom 352
Israel 292
Austria 196
Argentina 151
India 133
Singapore 66

Now you have a list of geos and some values to use as the weight when later creating the heatmap. But to plot these points, you’re going to need some lat and long coordinates.

Getting Lat Long Coordinates from Google Maps API

If you have a list of geos or “addresses,” you can use Geocoding to convert those geos into lat/long coordinates. From Google: “Geocoding is the process of converting addresses (like “1600 Amphitheatre Parkway, Mountain View, CA”) into geographic coordinates (like latitude 37.423021 and longitude -122.083739), which you can use to place markers on a map, or position the map.”

To use this Google Maps service, you need to have a Google Maps API key. To get a key, you can follow the directions here. When you sign up for an API key, you should select “Server Side Key,” since we will be running a Python script server-side to access the Google Maps API.

Once you have your api_key, you can work on getting geocoded results for all of your geos. You can do this with the following code:

import requests
# set your google maps api key here.
google_maps_api_key = ''

# get the list of countries from our DataFrame.
countries = grouped_country_df.index
for country in countries:
    # make request to google_maps api and store as json. pass in the geo name to the address 
    # query string parameter.
    url ='{}&amp;key={}'\
         .format(country, google_maps_api_key)
    r = requests.get(url).json()

    # Get lat and long from response. "location" contains the geocoded lat/long value.
    # For normal address lookups, this field is typically the most important.

    lat = r['results'][0]['geometry']['location']['lat']
    lng = r['results'][0]['geometry']['location']['lng']

This only gets you so far, since you still need to do something with those latitude and longitude coordinates. We have a few options here:

  1. If you are building a web application, you can pass those values into an HTML template as variables and they will end up getting plotted via JavaScript.
  2. We can print out the format of the JavaScript, and later past it into our HTML file within script tags.
  3. Other approaches that I’m not going to talk about.

For the sake of time, I’m going to show #2, which lends itself to a one-off analysis. You’d probably want to go with some dynamic templating approach, like #1, if you are going to pull and plot the same data repeatedly.

Add the following code to your for-loop from above, right underneath

lng = r['results'][0]['geometry']['location']['lng']

# set the country weight for later. by getting the value for each index in the dataframe
# as it loops through.
country_weight = int(grouped_country_df.ix[country])
# print out the Javascript that we will be copy-pasting into our HTML file
print '{location: new google.maps.LatLng(%s, %s), weight: %s},' % (lat, lng, country_weight)

After running your script, copy the output, which should look like this:

{location: new google.maps.LatLng(37.09024, -95.712891), weight: 3430},
{location: new google.maps.LatLng(40.463667, -3.74922), weight: 577},
{location: new google.maps.LatLng(55.378051, -3.435973), weight: 352},
{location: new google.maps.LatLng(31.046051, 34.851612), weight: 292},
{location: new google.maps.LatLng(47.516231, 14.550072), weight: 196},
{location: new google.maps.LatLng(-38.416097, -63.616672), weight: 151},
{location: new google.maps.LatLng(20.593684, 78.96288), weight: 133},
{location: new google.maps.LatLng(1.352083, 103.819836), weight: 66},

You’re going to use these values in the next step.

Creating an HTML file that contains Javascript for Plotting your Lat Long Points.

You need to create an HTML file that contains some script tags within it. I am simply going to paste my code below with annotations. If you copy the location strings from above, you will be able to paste them directly into this HTML file under the “heatmapData” array (defined below in the code).

<!DOCTYPE html>
    <title>Simple Map</title>
    <meta name="viewport" content="initial-scale=1.0, user-scalable=no">
    <meta charset="utf-8">
      html, body, #map-canvas {
        height: 100%;
        margin: 0px;
        padding: 0px
    <!-- Load Google Maps API. -->
    function initialize() {
      var heatmapData = [
        {location: new google.maps.LatLng(37.09024, -95.712891), weight: 3430},
        {location: new google.maps.LatLng(40.463667, -3.74922), weight: 577},
        {location: new google.maps.LatLng(55.378051, -3.435973), weight: 352},
        {location: new google.maps.LatLng(31.046051, 34.851612), weight: 292},
        {location: new google.maps.LatLng(47.516231, 14.550072), weight: 196},
        {location: new google.maps.LatLng(-38.416097, -63.616672), weight: 151},
        {location: new google.maps.LatLng(20.593684, 78.96288), weight: 133},
        {location: new google.maps.LatLng(1.352083, 103.819836), weight: 66},
      // Add some custom styles to your google map. This can be a pain. 
      var styles = [ 
          "featureType": "administrative",
          "stylers": [
            { "visibility": "off" }
          "featureType": "road",
          stylers: [ 
            { "visibility": "off"}
          "featureType": "landscape",
          "elementType": "geometry.fill",
          "stylers": [
            { "color": "#ffffff" },
            { "visibility": "on" }
      // create a point on the map for the Atlantic Ocean, 
      // which will later be used for centering the map.
      var atlanticOcean = new google.maps.LatLng(24.7674044, -38.2680446);
      // Create the styled map object.
      var styledMap = new google.maps.StyledMapType(styles, {name:"Styled Map"});
      // create the base map object. put it in the map-canvas id, defined in HTML below.
      map = new google.maps.Map(document.getElementById('map-canvas'), {
        center: atlanticOcean, // set the starting center point as the atlantic ocean
        zoom: 3, // set the starting zoom 
        mapTypeControlOptions: {
          mapTypeIds: [ google.maps.MapTypeId.ROADMAP, 'map_style'] // give the map a type.
      // Create the heatmap object.
      var heatmap = new google.maps.visualization.HeatmapLayer({
        data: heatmapData, // pass in your heatmap data to plot in this layer.
        opacity: 1, 
        dissipating: false, // on zoom, do you want dissipation?
      heatmap.setMap(map); // apply the heatmap to the base map object.
      map.mapTypes.set('map_style', styledMap); // apply the styles to your base map.
      // Add a custom Legend to Your Map
      var legend = document.getElementById('legend');
      // This is hard-coded for the countries I knew existed in the set.
      var country_list = ['USA','Spain','United_Kingdom','Israel',
      // for each country in the country list, append it to the Legend div.

      for (i = 0; i < country_list.length; i++) {
          var div = document.createElement('div');
          div.innerHTML = '<p>' + country_list[i] + '</p>'

     google.maps.event.addDomListener(window, 'load', initialize);


    <'div id="legend" style="background-color:grey;padding:10px;">
    <strong>Countries Mapped</strong>

    <'div id="map-canvas"></div>

Open the HTML file in your browser, and you should see something like this.

google maps heatmap

Et Voila!