Seminars in Social Networks and Markets

Seminars in Social Networks and Markets -- Academic Year 2016/17 -- Final Projects

Final Projects

For the final project you have two options: a programming project or a literature survey.
The programming project can be developed individually or by a team of 2 people.
The literature survey has to be written individually.

For the programming project:

A typical project consists in mining a particular network (a social network, for example, or a network from the natural sciences, linguistics, social sciences, etc.) to compute interesting statistics and properties, discover communities, and so on. See the list below for some specific examples. However, you can also try to propose a different idea if you have one.
When you find a topic send an email (with subject "[SEMSNM]") to Dr. Bonifaci describing your idea and the size of data that you expect to work on: how many nodes and arcs do you expect your network to have? This is to make sure that your project is doable and that it suffices for the course.
You can use the language and/or library of your choice, as long as it does not provide directly any algorithm that you have to implement for the project!
At the exam you should hand in the code and do a presentation of about 20 minutes.
If you have any questions about anything related to the project, mail to Dr. Bonifaci.

Here are some concrete project ideas:

Compute basic statistics (degree distribution) and advanced statistics (clustering coefficient, closeness/betweenness centrality, etc.) for one of the networks of the Stanford Large Network Dataset Collection. If the network is too large, you should use some approximation algorithm (see below for references). Visualize the results using a plotting software such as gnuplot, Gephi or R.
Implement some generative network model that attempts to capture the structure of real-world transportation and communication networks. For example, try to reproduce the experimental results in one or more of the following models: model for the Internet, model for distribution networks. Other relevant papers: Gastner and Newman 2006b, 2006c.
Implement the fast approximation algorithms of Schank and Wagner for the clustering coefficient and transitivity measures, and use it to compute the clustering coefficient and transitivity of a large network.
Implement the fast approximation algorithm of Eppstein and Wang for closeness centrality and use it to compute the closeness centrality of the nodes of a large network.
Implement one of the fast approximation algorithms of Riondato and Kornaropoulos for betweenness centrality and use it to compute the betweenness centrality of the nodes of a large network.
Crawl a part of the Facebook network using the Facebook Graph API. Gather friends, and create the social network of people who participate. Compute statistics such as degree distribution, clustering coefficient, average distance, diameter, etc.
Compute influence and passivity of Twitter users using the Twitter API. Reference.
Detect communities on some network (for example, on the Italian actor collaboration network from IMDB; you can use OMDB to extract the data), using the divisive algorithm of Newman and Girvan, or using an agglomerative method by Newman, also based on the modularity measure. When the network is very large (hundreds of thousands of nodes, millions of edges), you can try the fast heuristic by Clauset et al.
Study the structure of a linguistic network. For example, write a program to construct the word co-occurrence network of your favorite book from liberliber.it or gutenberg.org, draw the rank/frequency plot as explained here and determine the exponent of the power law distribution that best fits the data. See this paper for other details on word co-occurrence networks.
Can you use network analysis to help decrypt a mysterious text? The Voynich manuscript is a medieval code that has resisted all attempts at being deciphered. Study the word distribution and word co-occurrence network of existing digital versions of the manuscript and see if you can reproduce some of the results in the research papers of Landini and/or Montemurro et al.
Use Octave to implement a simulator of a traffic network. Given a network with demands and linear link costs, your simulator should compute an equilibrium of the selfish routing game and graphically display it.

Useful links and tools:

Datasets
- SNAP - Stanford Large Network Dataset Collection
- KONECT - Koblenz Network Collection
- netdata - M.J. Newman's network datasets in GML format
- Pajek - Miscellaneous network datasets in NET format
Visualization
- Cytoscape - Complex network analysis and visualization platform
- Gephi - Graph visualization and exploration platform
- Graphviz - Graph visualization software
- Gnuplot - Plotting utility
Languages and libraries
- Octave - High-level language for numerical computing
- Julia - High-level high-performance language for numerical computing
- GraphChi - Disk-based large-scale graph computation in Java and C++
- NetworkX - Python package for complex networks
- METIS - Graph partitioning tools and C library
- MASON - Fast discrete-event multiagent simulation library core in Java
- NETLOGO - Multiagent programmable modeling environment

For the literature survey:

You can start with one of the following papers:
or with a paper from the proceedings of
that is relevant to the class material. You can also use the articles linked in the "Optional reading" column of the class diary as a starting point.
You should read more papers related to the topic of the initial paper that you chose. To do that, you should check the previous work (found by the citations at the end of the paper), and any new work (you should use Google Scholar for that). Overall, you should read at least three papers to include in the survey.
You should prepare the survey, which should be a document of about 15 pages, that describes the area related to the papers, with a summary of their results, and a list of related papers that you have found.
At the exam you should hand in the survey and do a presentation of about 20 minutes. You will be asked questions to ensure that you completely understand the material you present.
Before starting working on the paper, you should send an email (with subject "[SEMSNM]") to Dr. Bonifaci describing the initial paper that you found and the area for which you will write the survey, to make sure that it suffices for the course.
If you have any questions about anything related to the survey, mail to Dr. Bonifaci.