From a pool of more than 1,000 applicants, ninety participants from 37 countries were invited to participate in Al Jazeera's media hackathon CANVAS in Doha, Qatar.
Our team of six created a working prototype of "Lasertag" which allows media organizations to regain value from their archives by automating suggestions of contextual/relevant content, links and embeds for journalists to include in their stories. It uses an algorithm that pulls from a news organization's API (we tested it on Al Jazeera's API) to create connections between noun phrases (during the hackathon we processed more than 15 million database operations and created 1 million connections).
Lasertag won the hackathon's prize for best tool for media production and was featured on media outlets like GigaOm.
Journalists and web producers know that the archives contain valuable context for current stories, but many avoid diving deep into the archives to find it. Why? Often, those archives are in complex, proprietary systems that offer little more than a search box. More often, the person searching for that context doesn't have the time to dive through dozens or hundreds of stories to find the ones that are the most relevant. What if a simple algorithm could make the first pass for them? Then the journalist just has to pick the best stories for context using human judgement and news organizations can capture some of the value now trapped in those archives.
For this proof of concept, we used 3,900 stories from the Al Jazeera API. Finding noun phrase overlap required 15.2 million database operations, and resulted in more than 1 million relationships. Next steps: Clearly, the algorithm will need to be optimized and faster processing methods will have to be found for it to go live. But the proof of concept works. Highly relevant stories are returned to the user based on the content of a given story. Testing it with a deeper archive with more stories would be ideal.