Time: TuTh 11-12:15 PM, 0218 Siebel Center.
Tandy's Office hours : Wednesdays 4-5 PM by zoom (link to be sent by email) and in SC 3235 on Thursays 12:15-1:00 PM.
Announcements:
Textbook and course materials Most of the reading assignments will be to journal papers (see the current list); these will be part of the homework assignments. However, we will also use as a basic reference (textbook) Mining Complex Networks, second edition. If you have trouble accessing this second edition, you can see the first edition at Mining Complex Networks, published by Chapman and Hall, 2021. You should be able to download this by using your UIUC affiliation. For this textbook, we will mainly use material from Chapters 1-3, 5, 6, and 8.
Lectures: These will be posted here at least 24 hours in advance. You are expected to read these before coming to class.
Homework See this page for the list of assignments.
Guest lecturers We will have 11 guest lectures from researchers here at UIUC and from around the globe. See this page for the current list.
Course description: This is a course on applied algorithms, focusing on the use of discrete mathematics, graph theory, probability theory, statistics, machine learning, and simulations, to design and analyze algorithms for community detection, community search, and community extraction in large graphs (e.g., social networks, biological networks, and citation graphs) with millions of nodes, with the goal of making important breakthroughs in either theory or development of improved scalable methods. We will examine these questions from both a theoretical perspective (e.g., computational complexity and design of algorithms for hard optimization problems, resolution limit) as well as from a data-driven perspective. Of particular interest in this course are how well existing methods actually perform on large real-world and synthetic datasets in terms of cluster quality, which includes density, separability, and well-connectedness (e.g., the size of the minimum edge cut). Clustering accuracy will also be explored using synthetic networks, and so limitations of current simulators are also of interest. The course will involve substantial literature review and critique, hands-on assignments using existing codes on large networks, and a final course project that is designed by the student, and which should be presented in a final paper that is suitable for submission to a conference such as KDD, Neurips, or Complex Networks and their Applications. This course is designed for PhD students in Computer Science with an interest in algorithm design and implementation (including parallel and HPC implementations), but students from closely related programs (e.g., ECE, ISE, Mathematics, and Statistics) with strong theoretical backgrounds are welcome.
A rough outline of the semester is as follows:
Specific topics we will cover:
My research in this field I have an active collaboration with George Chacko in network science and its relevance to understanding the development of scientific communities, with papers that range from pure theory to development of new algorithms and software to analyses of citation networks. See this page for more about our collaboration, which is funded by an NSF grant from the Office of Advanced Cyber-Infrastucture (OAC 2402559).
Prerequisites Graduate status in CS with background in algorithms (e.g., CS 374, CS 473). Ability to program.
Who should take this class: The course is designed for PhD students in CS, ECE, Math, Statistics, and ISE, or students planning to apply for PhD programs in these fields. Students from other programs may not have the mathematics training that is needed. In addition, students who are not in PhD programs and who do not plan to apply for a PhD program may not enjoy the focus on research and mathematics.
Undergraduate students: If you are an advanced undergraduate student and interested in taking the course, please email me to discuss your qualifications. I generally do not let undergraduate students into the class because this is a research-focused advanced course requiring many different skills (including theorem proving, implementation, analysis of algorithms, scientific literature reviews, etc.). However, if you are sufficiently advanced (preferably a senior with substantial coursework already completed that shows these multitude of skills), serious about the commitment necessary to do this course, and planning to apply for PhD programs, then I may allow you into the class.
Homework See this page for the list of assignments. These will be posted at least a week in advance, and should be submitted in Moodle. Homework assignments will include writing, running experiments, and doing problem sets. A homework submitted after the posted deadline but within 24 hours will get 80% credit; no submissions are allowed after that time. All homeworks will be graded out of 100 points, and the worst homework grade will be dropped. Some of the assignments may involve doing analyses on the Campus Cluster and several will involve writing. Please note that I do want you to use assistive AI for writing.
Midterm March 10, in-class exam, without access to laptops or cellphones. You will be allowed to bring 1 page of notes (two-sided) with you.
Course project The course requires a final project of each student. You are strongly encouraged to do a research project, but you can also do a survey paper on some topic relevant to the course material. In both cases, your project should be a paper of at least 3000 words (not including the bibliography) in a format and style appropriate for submission to a journal such as Applied Network Science or a conference such as KDD, Neurips, or Complex Networks and Their Applications. Research projects can involve two students, but survey papers must be done by yourself. (Note: When projects are done by two students, the division of the work should be communicated in the write-up, and each student should submit their own course project write-up. Please see me to discuss requirements regarding division of work and write-ups for your specific project, if this applies to you.) Grades on the final project depend upon the kind of project you do. For a research paper, your grade will be 30% writing and 70% content. If you do a survey paper, the grade will be 40% writing and 60% content. In both cases, you should include a thoughtful discussion of the relevant literature and have an appropriate bibliography. Note also the requirements for reproducibility (for research papers) and the expectations about writing quality. I meet with each group several times during the semester to help them make progress.
Course participation An essential part of this course is the class discussion of the research topics that are presented. What this means in terms of the grade is that you should expect to be called on to answer questions, and you should also ask questions. Please plan on being actively involved!
Presentations in class Every student will present at least one paper, selected from the research literature. These papers could be suggested by the student and related to the research question that the student will study for their course project, or can be suggested by the instructor. Every student will also present their course project proposal, and (time permitting) the progress report on the course project.
Assigned reading Some of what your grade is based on requires doing assigned reading from the textbook and papers (both recent and not so recent) in the scientific literature (see the current list). These reading assignments will be listed in the Moodle page for the course. You are expected to complete these reading assignments before coming to class, so that we can discuss the material as a group during the class meeting.
Guidance on writing assignments.
Many of the activities in this course involve writing,
and the grade for these assignments depends in part on
quality of the writing.
This is
specifically true for the final project.
It's very important that you familiarize yourself
with expectations about scholarly writing, and in particular
with how to avoid plagiarizing.
Please see the information in the Academic Integrity page
and specifically note the instructions about plagiarism and how
paraphrasing improperly can count as plagiarism.
General advice about writing is available at
this page.
Note also that the use of ChatGPT or other assistive AI is not allowed
in any assignments, other than scripting.
Laptop and cellphone policy During class time, unless the class is held by zoom, only those people presenting material should use laptops, and no one should use cellphones (see this NYT article and also this article). Lectures presentations (PDF or PPTX) will be posted and available at least 24 hours before class, and you are expected to review the lectures before coming to class.
Generative AI policy The use of generative AI to develop code for homework assignments or course project is permitted, but must be fully disclosed (at the top of the submitted material). However, generative AI is not permitted for writing assignments beyond finding spelling and grammar mistakes. Please do not use any generative AI to produce text, or rewrite your text.
Absence policy Absences are allowed but the student is required to learn what was covered when they were not attending the lecture. Note that the lectures are not recorded! For this reason, course presentations are provided on the course webpage.
COVID-19 precautions If you are ill with anything, please either do not come to class or office hours, or else wear an N95 mask. In addition, if you have COVID-19, have been exposed to COVID-19, or have recently tested positive for COVID-19, do not come to class or in-person office hours.