Homework, CS 598-GA, Spring 2026
All homework is due in Moodle, unless otherwise indicated, and typically on Monday by 10 PM.
Homework includes reading assignments, which will form the basis of discussions in class.
You are expected to read these before coming to class.
When a homework has a writing assignment, follow the instructions about
not using assistive-AI, as follows.
For any writing assignment, focus on communicating ideas, and don't worry about
style. Do not use assistive AI for generating text, improving text, etc.
You are welcome to use very simple methods like WORD to find errors in spelling and grammar, however.
Homework assignments
- Jan 26 (HW 1)
- Reading assignment.
- Leskovec, J., Lang, K.J. and Mahoney, M., 2010, April. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international conference on World wide web (pp. 631-640).
(link)
- Yang, J. and Leskovec, J., 2012. Defining and evaluating network communities based on ground-truth. In Proceedings of the ACM SIGKDD workshop on mining data semantics (pp. 1-8). (link)
-
Writing assignment:
- Write a 2-3 page summary of the papers and your thoughts about the questions identified.
Be prepared to discuss in class.
- Feb 2 (HW 2).
-
Reading assignment:
- Mining Complex Networks: read Chapter 1.1-1.11 and Chapter 5.1-.5.6.
- Fortunato, S. and Barthelemy, M., 2007. Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1), pp.36-41.
(link)
- Traag, V.A., Van Dooren, P. and Nesterov, Y., 2011. Narrow scope for resolution-limit-free community detection. Physical Review E -- Statistical, Nonlinear, and Soft Matter Physics, 84(1), p.016114 (9 pages).
(link)
-
Writing assignment:
- Provide a PDF paper, at least 2 pages with separate page of references,
about your thoughts regarding the resolution limit and the concept of ``resolution-limit-free".
Are you satisfied by this resolution-limit-free concept? Would you have proposed something different?
You can reference the reading list above as well as any other papers you find that are relevant.
- Feb 4 (HW 3)
- Reading assignment:
Vu-Le et al.,
EC-SBM synthetic network generator.
Applied Network Science 2025
(link).
-
Writing assignment:
1-2 page document, summarizing what you
understand and think about the paper,
and listing 2 questions you will ask
The-Anh after his talk.
- Feb 9 (HW 4)
- Prove that for all networks N, optimizing under CPM never produces disconnected
clusters.
- Prove that for all networks N, optimizing under CPM satisfies Traag et al.'s definition of
resolution-limit-free.
- Prove or disprove, IKC satisfies Traag et al's definition of resolution-limit-free.
- Analyze what IKC does on a ring-of-cliques, where there are $n$ cliques, each of size $m$.
- Let $N_k$ be the infinite family of networks that has two components, where one is a singleton node and the other is formed by taking two cliques of size $k$ and connecting them by an edge.
Prove or disprove: for all k>2 and gamma in (0,1), an optimal clustering under CPM(gamma) will return the k-cliques as individual clusters.
- Make a list of 2-3 papers that interest you and that you would like to present in class and have the entire class read.
(Any papers you like are fine, but you can also pick one or more from the long list of papers I posted).
- Feb 11 (HW 5)
-
For each of the following problems, say whether the community
search problem is polynomial or NP-hard:
- Given network N and vertex v, find the largest value for k such that
v is in a k-truss
(i.e., a subgraph of N so that every edge is in at least k-2 triangles; note that this
is not a vertex-induced subgraph, but an edge-induced subgraph)
-
Given network N and vertex v, find the largest value for k such that
v is in a k-clique
-
Given network N and vertex v, find the largest value for k such that
v is in a k-core
(i.e., a subgraph of N so that every node has at least k neighbors in the
subgraph)
-
Given network N and vertex v, find the largest value for k such that
v is in a k-ECC
(i.e., a subgraph of N that has edge-connectivity at least k)
- Feb 16 (HW 6)
- Let N=(V,E) be a network with 100 nodes.
Suppose the true clustering C has two true communities each with 50 nodes.
Let C1 be the estimated clustering that puts all the nodes into singleton clusters,
and C2 be the estimated clustering that puts all the nodes into one cluster.
Compute accuracy/error for the following pairs of clusterings:
C and C1, and C and C2.
For the accuracy measures, report AMI, ARI, NMI, precision, recall, and F1.
Calculate TP, FP, FN.
Report node coverage.
You are welcome to use software to do these calculations.
Comment on what you learn.
- Reading and writing assignment:
Find one or more papers from the last 15 years on overlapping community detection
or community search, which is also called local graph clustsering; in addition to using Google Scholar or some other
source, you can look at the papers listed at https://tandy.cs.illinois.edu/598-2026-assignedpapers.html.
Write up a summary of the paper along with your thoughts about the
paper.
Be prepared to discuss this in class.
- Feb 23 (HW 7)
- This assignment will involve data analysis.
You will apply community detection methods to EC-SBM synthetic
networks, which come with
ground truth communities; this will allow you to evaluate
accuracy.
You will report empirical statistics about
the communities you obtain (e.g., distribution of community
sizes and edge densities, mixing parameters, percentage of
nodes not in any cluster of size at least two).
You will also report accuracy using at a minimum AMI, ARI, and
NMI.
You will also write up your study with details
sufficient to be reproducible and
with commentary about what you learned.
See this page for
details.
- March 2 (HW 8)
-
You are considering running WCC or CM with your clustering methods, but you aren't very
satisfied by the ad hoc nature of WCC and CM.
Specifically, a cluster will be split into two parts by removing an edge cut
whenever the edge cut size is at most the threshold (log10(n)) used by WCC and CM.
Can you come up with a better rule for deciding whether to cut the cluster into two
parts? Comment on each of the following:
-
For any clustering, deciding to remove a small edge cut only if the edge density in each part is larger than the edge density of the initial cluster,
- For any clustering, deciding to remove the small edge cut if the total number of edges that go between
clusters decreases compared to the initial clustering,
- If your cluster is part of a modularity clustering, deciding
to remove the small edge cut if the total modularity score increases,
- If your cluster is part of a CPM(gamma) cluster,
deciding to remove the small edge cut if the total CPM(gamma) score
increases
Feel free to suggest something that you think might be a better guideline
for when to remove the small edge cut than
any of the ones listed above.
Participation activities
Please note that in Moodle I have listed assignments
related to class participation.
These are mainly: submit questions to the presenter
the night before the presentation.
These assignments do not count towards the homework grade but
do count towards your class participation.