CS 5740 SP21

Time: MoWe 1:35PM - 2:50PM
Listing: CS 5740

Instructor: Yoav Artzi
Teaching assistant: Ge Gao
Graders: Kuan-Ting Liu, Wenyi Chu, and Cheng Wang

Canvas 📋 | iCal 📆 | Forum 🗣

Peer Evaluation

We will use peer evaluation as part of grading assignments. After all deliverables have been submitted, we will ask you to fill out an online form to officially evaluate your teammates. The purpose of this peer evaluation is to evaluate team citizenship, not technical capability.

You will rate each team member, including yourself. These ratings should reflect each individual’s level of participation and effort and their sense of team responsibility. The scale is as follows:

Excellent: Consistently carried more than their fair share of the workload
Very good: Consistently did what they were supposed to do, very well prepared and cooperative
Satisfactory: Usually did what they were supposed to do, acceptably prepared and cooperative
Ordinary: Often did what they were supposed to do, minimally prepared and cooperative
Marginal: Sometimes failed to show up or complete assignments, rarely prepared
Deficient: Often failed to show up or complete assignments, rarely prepared
Unsatisfactory: Consistently failed to show up or complete assignments, rarely prepared
Superficial: Practically no participation
No show: No participation at all

You will also write brief comments to justify your ratings. Your comments will not be revealed to your teammates. Only the professor and perhaps a TA will see your comments.

The ratings you give your peers and yourself will be transformed into a numeric score for each team member. Each assignment is worth 13pt. The assignment grading will account for 10pt, and the peer evaluation score will account for 3pt. The peer evaluation score is computed on a scale of 20, where 20 equals the complete 3pt. The 10pt will be identical for all team members, while the 3pt given for peer evaluation will be individual. The ratings you submit will not be directly revealed to your team members, but some function of them will be.

TL;DR

Just do your best to honestly assess yourself and your teammates. The rating system we are using is robust and works well in practice. It was used in CS 3110. It has also been studied academically; see the citations at the end of the document for details.

The rest of this document describes how scores will be calculated. The details are here as reassurance, rather than because you need to know them. It’s fine to stop reading here.

Calculation of Scores

Each qualitative rating will be transformed into a quantitative rating as follows:

Excellent: 100
Very Good: 87.5
Satisfactory: 75
Ordinary: 62.5
Marginal: 50
Deficient: 37.5
Unsatisfactory: 25
Superficial: 12.5
No Show: 0

Suppose that a team of three people submits the following ratings:

Name	Vote 1	Vote 2	Vote 3
David	87.5	100	87.5
Anne	100	100	87.5
Michael	75	75	75

Each vote was submitted by one of the team members, providing a rating of all three team members (including themself). For example, maybe David submitted vote 2, in which he gives himself and Anne ratings of 100, but gives Michael a score of 75. But, it doesn’t matter who submitted which vote for the calculations we’re about to describe. A four-person team would, of course, have an additional row and an additional column.

The individual rating for a team member is their average quantitative rating, including their own self-rating. These are the individual ratings for our example team:

Name	Individual Rating
David	91.67
Anne	95.83
Michael	75

The team rating is the average of all the quantitative ratings for all team members. Our example team has nine quantitative ratings, and the average of them is 87.5, so that is the team rating.

The individual adjustment factor (henceforth, factor) is an individual’s rating divided by the team rating. The factor is capped at 1.05. The teamwork score is the factor times 20, rounded to the nearest integer. For our example team, the factors and teamwork scores are as follows:

Name	Factor	Score
David	1.047	21
Anne	1.050	21
Michael	0.857	17

That teamwork score is what will be used for the Peer Evaluation component of the grade. Note that it’s possible for some team members to get a small bonus.

How this Worked in the Past

A similar calculation was used in CS 3110 in Ithaca. It resulted in a mean factor of 1.010 and standard deviation of 0.091. So, only in rather extreme situations would anyone lose more than about 5 points from their teamwork score. We therefore recommend that, instead of trying to overanalyze or game this calculation, you simply fill out the evaluations as honestly as you can.

Some Examples

Next, we discuss some situations that might arise and how this scoring system handles them.

Everyone gives the same rating to everyone. Then everyone gets a teamwork score of 20. For example:

Name	Vote 1	Vote 2	Vote 3	Individual	Factor	Score
David	75	75	75	75	1	20
Anne	75	75	75	75	1	20
Michael	75	75	75	75	1	20
			Team:	75

Note that it doesn’t matter whether everyone used 75 or 100 or 25 for their votes. As long as everyone agrees, everyone gets the score of 20.

One person dislikes the rest of the team. Then the other team members’ scores go down, but not by much.

Name	Vote 1	Vote 2	Vote 3	Individual	Factor	Score
David	100	100	0	66.67	0.857	17
Anne	100	100	0	66.67	0.857	17
Michael	100	100	100	100	1.05	21
			Team:	77.78

Whoever submitted Vote 3 (probably Michael) has caused David and Anne’s scores to go down by 3 points. Out of their final grade in the entire course, this makes little difference.

The rest of the team dislikes one person. That person’s score goes down by about half.

Name	Vote 1	Vote 2	Vote 3	Individual	Factor	Score
David	100	100	100	100	1.05	21
Anne	100	100	100	100	1.05	21
Michael	0	0	100	33.33	0.429	9
			Team:	77.78

It looks like David and Anne don’t like Michael. He loses 11 points. This might be enough to impact his final letter grade, but no more than that. This is an extreme situation, because it makes Michael’s factor go down so low. In such cases, the professor will read (i) the written comments provided by the other team members and (ii) the activity on the GitHub repository to see whether they provide justification for lowering Michael’s score. If the professor thinks the other team members have been too critical, then the professor could raise Michael’s factor.

The dislike is mutual. Then the outcome doesn’t change by much.

Name	Vote 1	Vote 2	Vote 3	Individual	Factor	Score
David	100	100	0	66.67	1.05	21
Anne	100	100	0	66.67	1.05	21
Michael	0	0	100	33.33	0.6	12
			Team:	55.56

This time Michael dislikes David and Anne, too. Their score remain unchanged; his goes up by a little.

Team members fail to provide ratings. If a team member fails to vote, that person’s column will be filled automatically. A zero will imputed to any team member who didn’t vote (including themself), and a 25 to those who did. For example, suppose that Michael failed to vote. Then his vote (#3 below) would be filled in with a 25 for Anne and David and a 0 for himself:

Name	Vote 1	Vote 2	Vote 3	Individual	Factor	Score
David	100	100	25	75	1.038	21
Anne	100	100	25	75	1.038	21
Michael	100	100	0	66.67	0.923	18
			Team:	72.22

This results in about a 10% deduction for Michael.

Acknowledgment

This evaluation scheme was adapted from CS 3110 in Ithaca. The core of this rating scheme has been examined and found to be highly useful and infrequently problematic in three academic publications:

R.W. Brown. Autorating: Getting Individual Marks from Team Marks and Enhancing Teamwork. IEEE Frontiers in Education Conference, 1995.
D.B. Kaufman, R.M. Felder, H.Fuller. Accounting for Individual Effort in Cooperative Learning Teams. J. Engr. Education 89(2): 133-140, 2000.
B. Oakley, R.M. Felder, R. Brent, I. Elhajj. Turning Student Groups into Effective Teams. J. Student Centered Learning 2(1): 9-34, 2004.