Meet the Twitter Collection and Analysis Toolkit (TCAT), a graphing and modeling tool for Twitter data. TCAT collects tweets then processes the data for network analysis and visualization. With this cloud-based software, social data in the tens of millions of units is quickly and easily sorted by algorithms to find people or items of importance. It requires no programming or expert software knowledge.
How do you get your hands on this new tool? At the moment the best way is to form a partnership with the Division of Emerging Media Studies at Boston University. See below for more details. Visit this page for an interactive sample.
Jacob Groshek is an assistant professor of Emerging Media AT BU and has been the architect supervising the installation, development, and maintenance of the TCAT. The software went live this September and has since archived nearly 65 million Tweets, including those on Ebola, Gaza, Ferguson, gubernatorial campaigns, and many others. Groshek and his research team at BU and at the Betweetness Labs Consultancy have been examining and testing TCAT’s ability to locate influential Twitter users and to identify the most prominent keywords used to describe brands and topics.
Figure 1: Prominent keywords co-mentioning ‘selfie’ and ‘snapchat’ hashtags from 1,527,433 Tweets graphed into 1013 nodes and 24,650 undirected edges.
So what can TCAT do?
TCAT can find all the hashtags related to a common subject. For example, if you started a keyword track on ‘Nike’ or ‘Coca Cola’ it would collect all tweets that mention any of those keywords in any order, with or without hashtags. There are additional options for collecting data on more specific key terms or user accounts and at wider but still relevant levels.
Once collected, you can measure the popularity and frequency of various hashtags and create minute-by-minute timelines of Twitter activity. You can determine when one hashtag overshadowed another, almost down to the minute. You can find the visibility of a user by the number of mentions they receive and their user stats: How many followers they have, who follows them, how many favorites they’ve received, their interests, and whom they share those interests with.
More importantly, the TCAT system produces dynamic interactive network graphs that show how all these things are connected. The graphs are visualized, sized, and color coded by specific and precise algorithms, which makes interpretation straightforward and intuitive. All the graphs can be downloaded to Excel for future use.
“It’s useful as a tool to find influential users, and to engage users that are of particular interest,” said Groshek, “We can target specific groups and clusters of users that are talking with each other about a topic in a certain way.”
Groshek says the TCAT tools he uses most often are the co-mention and the co-hashtag graphs. When combined, the two graphs show the most visible people using a group of hashtags, how they connect with others, as well as which users are the most influential in spreading messages to diverse user groups. He regularly works with journalists to find sources and to get a sense of which users and keywords are engaged most often, and to leverage those connections to grow audiences.
But that is far from the TCAT’s only use. “There’s no limit to what this system can effectively be applied to. For example, identifying how are brands being discussed. We can locate which users are talking about a brand, and of those, which are most important to engage to reinforce or redirect what’s being said. And at the end we can monitor how successful the effort has been.”
Figure 2: Influential ‘investing’ Tweeters in 127,997 ‘investing’ Tweets from 62,934 users graphed into 1,000 nodes with 1,068 directed edges
TCAT is just now reaching the market. Early feedback from a group of beta testers from news organizations, law offices, national research organizations, and PR and advertising industry groups has been overwhelmingly positive.
Those interested in using the TCAT system need wait no longer:
- Betweetness Labs Consultancy is currently doing analyses and producing reports directly for clients.
- Alternately, sponsors can make donations to BU’s Division of Emerging Media Studies (DEMS) and receive lifetime access to the TCAT software. They also get training and ongoing support in how to use the cloud-based system and related analytic software.
- Groshek has negotiated private and corporate sponsorship programs with DEMS so that more commercial uses and applications can be provided.
In the coming years, it is possible that similar tools for social media research will become more widely available and more common. For the time being Groshek believes the TCAT system is something unique from social media ‘analytics’ companies that most often focus on metrics such as followers, likes, retweets, and posting frequency to identify users and topics.
“Our TCAT system is somewhat raw in that it requires a little bit of human thinking to combine the science of algorithms with the art of network analysis, but that is the beauty of it,” Groshek said, “The TCAT is powerful not only because of what it provides automatically and computationally, but also because it puts those metrics directly into the hands of bright people that can act on, and make sense of patterns observed visually and in data streams.”
Figure 3: Influential ‘Massachusetts politics’ users in 63,843 Tweets about Mass. politics from 15,138 users graphed into 1,000 nodes with 15,056 directed edges
More details on the software and examples of its application are available at http://www.betweetness.com/, and all inquiries should be directed to Jacob Groshek, either at email@example.com or 857-615-4709. ∞