MALLET topic analysis of JCDL + Open Video tweets
I'm working towards some interesting visualizations of the twitter streams from a number of conferences (starting with JCDL and Open Video this last week). I'm using Judith Bush's very cool gawk script to parse up the raw atom files. My first step was to get topics for the corpus as a whole:
/Applications/mallet/bin/mallet train-topics --input data.mallet --num-topics 10 --output-state topic-stat.gz --output-doc-topics doc-topics --output-topic-keys doc-keys --num-iterations 2000 --optimize-interval 2500
JCDL
0 5 http bit ly org interesting marshall analysis wolf week existing pizza people 1 5 jcdl books data don works problem target foundation facilitate creating 2 5 jcdl libraries evaluation future discussion day multiple public lots univ 3 5 jcdl paper lightweight music back issues funny build dog 4 5 session user talk search talking papers documents great collection type tatted 5 5 conference library good mentors content students focus run building pints 6 5 jcdlgoogle www law participation dl dchud online nice bats duck 7 5 jcdl austin poster google tomorrow small librarian tonight nice 8 5 jcdl digital tags question collections social wikipedia war 9 5 workshop people time quality study alan live archive idea lotsOpen Video
0 5 video conference open source net making metadata mozilla developers adobe brokep learned system presentation long openvideo ly app msf 1 5 openvideo ovc tv time gd week vlc html stuff folks nyc platform google meet checking slides startrek kdnlf ll 2 5 media goodman amy watch good great mainstream idea war days im tr flash tpb change put films class devine 3 5 openvideo youtube rt videos session world xenijardin system room art doesn show iran channel film audio totally activism presentation 4 5 openvideo content pirate public live sunde peter cc jardin project creative keynote speaker ogg sweden twitpic licensed seminar fisl 5 5 openvideo people internet talk access day tinyurl conf vid online storytelling awesome working hack digital miro final evolution similar 6 5 openvideo de free en la xeni el years amazing copyright film blog education works closed msurman tk iranian tagged 7 5 openvideo amp ted don great work culture fair back editing question technology site cable id lecture wiki form youtube 8 5 http bit ly check interviews wrap royblumenthal creativecommons based casts ll website footage archives ogg rad blogposts 9 5 openvideo org www openvideoconference make http web watching foss roflmemes put hope sessions online cool launches marketing rest rtFuture work will include temporal analysis and "speaker" analysis.