Home › Category Archives › Dataset

NCTC Dataset

NCTCWITS 300x225 NCTC Dataset
The NCTC World Wide Incident Tracking System

The National Counterterrorism Center’s (NCTC) Worldwide Incidents Tracking System (WITS) is a database of terrorist incidents.  The WITS database includes data from 1/1/2004 to 31/3/2010.

The database is downloaded in XML or as an Oracle DB fromhttp://www.nctc.gov/witsbanner/wits_subpage_exports.html

Internet Movie DataBase (IMDB) dataset

200px IMDb logo.svg Internet Movie DataBase (IMDB) dataset

Probably the most famous movie website, and its database can be access from

http://www.imdb.com/interfaces

Or downloaded at as set of text file from

ftp://ftp.fu-berlin.de/pub/misc/movies/database/

That’s 1,500,000 films!

Global Terrorism Database

The database is maintained at the University of Maryland. The instruction to download the entire dataset is available here.

global terrorism database Global Terrorism Database

Terrorism Resource Guide

It has links to several terrorism related datasets availalbe online.

terrorism resource guide Terrorism Resource Guide

TrackingTheThreat.com

As the website says, this is a “database of open source information about the Al Qaeda terrorist network”. You can query the database through its website, but haven’t find a way to download the entire dataset yet.

trackingthethreat com TrackingTheThreat.com

VAST 2010 Challenge

The VAST 2010 Challenge website is here. Below is the bits of information I found important:

Dates

Deadline: June 29

Notification: August 10

Challenges

This is a summary of the challenges. Go here for the details.

Mini challenge 1: forensic analysis of illegal arms dealing

Organize results according to country

1.1 Detailed answer: activities in each country and prediction

1.2 Detailed answer: association among the players through social network

Required skills:

  • text processing
  • geographical visualization
  • social network visualization
  • temporal visualization
  • might be: data mining/machine learning (for prediction)

Mini Challenge 2: Hospitalization Records – Characterization of Pandemic Spread

2.1   Detailed answer: characterize the spread of the disease.

2.2   Detailed answer: compare outbreaks across cities.

Required skills:

  • Database/SQL skills
  • Geographical visualization
  • Temporal visualization

Mini Challenge 3: Genetic Sequences – Tracing the Mutations of a Disease

3.1   Short answer: where is the original of the outbreak (region or country)?

3.2   Short answer: which patient is likely to have contracted the virus from Nicolai and why?

3.3   Short answer: Identify the top 3 mutations that lead to an increase in symptom severity.

3.4   Detailed answer: Identify the top 3 mutations that lead to the most dangerous viral strains

Required skills:

  • Sequence/string analysis
  • Geographical visualization
  • Temporal visualization


Grand challenge: combine of the 3 mini challenges

Other information

Entries will be judged on:

  • Accuracy metrics will evaluate the correctness of the data table answers based on the known “ground truth” embedded in the dataset.  These scores will be given to the teams.
  • The qualitative metrics will evaluate the perceived utility of the system including the visualizations and the analytic process used.  These will be based on participant’s descriptive explanations (short or detailed answers, and the video).

Both commercial software and research prototypes are allowed.

Encouraged to collaborate with teams having complementary skills. The organizer can help.

The dataset can be downloaded here. There are three datasets, but I can’t add the 2nd one here becuase it is over the wordpress size limit. Here are the Mini challenge 1 data files and Mini challenge 3 data files.

Submission

(Needed to understand the requirements of the challenges)

Short answers:

  • Mini challenge only
  • 150 words: answer and how it is arrived
  • 2 screenshots

Detailed answer:

  • Both mini and grand challenge
  • Details of how you arrive at the answer
  • For mini challenge: 1000 words & 5 screen shots
  • For grand challenge: less than 5000 words and max 15 screenshots are recommended, but no limit.

Video is required to show how the analysis is done.

  • With voice narration
  • 4 min for mini challenge
  • 15 for grand challenge

Debrief

  • Grand challenge only
  • 2000 words
  • Describe hypothesis

Two-page summary

  • Optional
  • Submit after competition results are announced
  • Included in the conference printed material