In this course, we will explore the leading edge of machine learning systems, focusing both on research advancements and trends in industry. We will teach you the conceptual and engineering skills necessary to apply machine learning techniques to petabyte-scale datasets. About half of the material will be conceptual and half practical; we will explore different approaches that different machine learning systems and frameworks take, as well as their tradeoffs, and we will focus on building machine learning and analysis pipelines with a selection of such systems/frameworks.
This class consists of:
- Lectures. Lectures will cover technical content, and be delivered by the professors or other instructors.
- Guest lectures. We will hear from researchers and/or practicing software engineers on topics related to engineering ML pipelines and systems.
- Homework assignments. There will be a handful of programming assignments designed to get your hands dirty using some of the systems and frameworks we learn about in class to scalably solve ML/analytics problems.
- A project broken into two parts over the course of the semester. You will use your new skills to answer an interesting question on extremely large datasets. This will require you to build your own machine learning pipeline, either distributed in the cloud, or using interesting architectures like GPUs, or both, and to serve predictions from your ML system rapidly to many clients.
- Two midterm exams. There will be two midterms, one at the halfway point of the semester, and one at the end. Exams will take place during the scheduled class time and are designed to reinforce the knowledge obtained throughout the class.
An introductory course in machine learning, like 10-601 or 10-701, is a prerequisite or a co-requisite. If you plan to take this course and 10-601 concurrently please tell the instructor. The course will include several substantial programming assignments, so an additional prerequisite is 15-211, or 15-214, or 17-214, 17-514, or comparable familiarity with Python/Java and good programming skills.
After this course, students will have learned to…
- Implement machine learning techniques at scale, either across distributed machines, or across many cores (e.g., GPUs).
- Understand the difference between emerging machine learning systems and frameworks and be able to make decisions on the tradeoffs between such systems.
- Deal with all aspects of the machine learning pipeline at scale, from data cleaning and quality, to deploying and monitoring prediction serving systems.
- Debug distributed and concurrent code running in the public cloud.
- Understand the unique software engineering aspects of deploying machine learning systems in production; from testing models in development, to feedback cycles and updating and redeploying prediction services as part of the prediction pipeline.
- Understand the tradeoffs between, and be able to employ various leading-edge techniques distributed learning.
There will be no required textbook. Instead, book chapters, research papers, whitepapers, and other articles may be provided in addition to the lecture materials.
Your final grade for the course will be based on the following approximate distribution:
- 30% Project (two parts)
- 25% Programming Assignments
- 20% Midterm 1
- 20% Midterm 2
- 5% Participation
The midterms will be in-class, closed-book exams. While they may cover all material to that point in the class, their content will emphasize the material covered since the previous exam.
For more information about the project component, see the project page.
The programming assignments will require you to apply some of the techniques you’ve learned in class to implement part or all of the ML pipeline. Typically assignments will be autograded with unlimited submissions allowed until the assignment deadline.
Teamwork will be essential for the final project component of this course. However, it is expected that each student is able to complete the programming assignments on their own, and will be graded individually.
Students are encouraged to talk to each other, to the TAs, to the instructors, or to anyone else about any of the assignments. Any assistance, though, must be limited to discussion of the problem and sketching general approaches to a solution. Each student must implement their own solutions to the programming assignments.
Consulting another student’s solution is prohibited, and submitted solutions may not be copied from any source. These and any other form of collaboration on assignments constitute cheating. If you have any question about whether some activity would constitute cheating, please feel free to ask the instructors.
Academic Honesty and Collaboration
The University Policy on Academic Integrity applies. The final project is done as a group. Our expectations regarding academic honesty and collaboration for group work are the same as for individual work, elevated to the level of “group.” Group members will collaborate with one another, but groups should work independently from one another, not exchanging code with other groups. Within groups, we expect that you are honest about your contribution to the group’s work. This implies not taking credit for others’ work and not covering for team members that have not contributed to the team.
The course also includes individual assignments and individual components of group assignments. Although your solutions for individual parts may be based on the content produced for the group component (e.g., written reflections), we expect you to complete individual components independently of your group mates.
Regarding the internet, StackOverflow, and similar sources: In real-world development, engineers often adapt code from Q&A sites, open source repositories, or similar sources to new ends. This is acceptable in this course, with two caveats:
- You may not copy a solution for our homework assignments specifically from another student or group, even if, for some reason, that code is available openly on GitHub or elsewhere (see below on the importance of keeping your homework code).
- You must test all of your code, and those tests must pass. That is, you must understand any code you adapt from the internet, and you must demonstrate that understanding using unit tests.
Regarding solutions from other students in the course, we reuse the Collaboration Policy from 15-214, with minor modifications:
“You may not copy any part of a solution to a problem that was written by another student, or was developed together with another student. You may not look at another student’s solution, even if you have completed your own, nor may you knowingly give your solution to another student or leave your solution where another student can see it. Here are some examples of behavior that are inappropriate:
Copying or retyping, or referring to, files or parts of files (such as source code, written text, or unit tests) from another person (whether in final or draft form, regardless of the permissions set on the associated files) while producing your own. This is true even if your version includes minor modifications such as style or variable name changes or minor logic modifications.
Getting help that you do not fully understand, and from someone whom you do not acknowledge on your solution. Writing, using, or submitting a program that attempts to alter or erase grading information or otherwise compromise security of course resources.
Lying to course staff.
Giving copies of work to others, or allowing someone else to copy or refer to your code or written assignment to produce their own, either in draft or final form. This includes making your work publicly available in a way that other students (current or future) can access your solutions, even if others’ access is accidental or incidental to your goals. Beware the privacy settings on your open source accounts!
Coaching others step-by-step without them understanding your help.
If any of your work contains any statement that was not written by you, you must put it in quotes and cite the source. If you are paraphrasing an idea you read elsewhere, you must acknowledge the source. Using existing material without proper citation is plagiarism, a form of cheating. If there is any question about whether the material is permitted, you must get permission in advance. We will be using automated systems to detect software plagiarism.
It is not considered cheating to clarify vague points in the assignments, lectures, lecture notes; to give help or receive help in using the computer systems, compilers, debuggers, profilers, or other facilities; or to discuss ideas at a high level, without referring to or producing code.
Any violation of this policy is cheating. The minimum penalty for cheating (including plagiarism) will be a zero grade for the whole assignment. Cheating incidents will also be reported through University channels, with possible additional disciplinary action (see the above-linked University Policy on Academic Integrity).
If you have any question about how this policy applies in a particular situation, ask the instructors or TAs for clarification.”
Note that the instructors respect honesty in these (and indeed most!) situations.
Take care of yourself
Do your best to maintain a healthy lifestyle this semester by eating well, exercising, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.
All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.
If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.
Diversity, Equity, and Inclusion
We must treat every individual with respect. We are diverse in many ways, and this diversity is fundamental to building and maintaining an equitable and inclusive campus community. Diversity can refer to multiple ways that we identify ourselves, including but not limited to race, color, national origin, language, sex, disability, age, sexual orientation, gender identity, religion, creed, ancestry, belief, veteran status, or genetic information. Each of these diverse identities, along with many others not mentioned here, shape the perspectives our students, faculty, and staff bring to our campus. We, at CMU, will work to promote diversity, equity and inclusion not only because diversity fuels excellence and innovation, but because we want to pursue justice. We acknowledge our imperfections while we also fully commit to the work, inside and outside of our classrooms, of building and sustaining a campus community that increasingly embraces these core values.
Each of us is responsible for creating a safer, more inclusive environment.
Unfortunately, incidents of bias or discrimination do occur, whether intentional or unintentional. They contribute to creating an unwelcoming environment for individuals and groups at the university. Therefore, the university encourages anyone who experiences or observes unfair or hostile treatment on the basis of identity to speak out for justice and support, within the moment of the incident or after the incident has passed. Anyone can share these experiences using the following resources:
- Center for Student Diversity and Inclusion: [email protected], (412) 268-2150
- Report-It online anonymous reporting platform: reportit.net username: tartans password: plaid
All reports will be documented and deliberated to determine if there should be any following actions. Regardless of incident type, the university will use all shared experiences to transform our campus climate to be more equitable and just.