Disparate Impact
Calculates the disparate impact ratio for a single protected attribute comparing privileged and unprivileged values.
def calculate_disparate_impact(df, outcome_column, protected_attribute,
privileged_value, unprivileged_value, favorable_result)
Parameters
-
df
(pd.DataFrame
): The DataFrame containing the data. This should include both the outcome column and the protected attribute. -
outcome_column
(str
): The name of the column in the DataFrame that represents the binary outcome. The outcomes should be encoded as 1 for a positive outcome and 0 for a negative outcome. -
protected_attribute
(str
): The name of the column in the DataFrame that represents the protected attribute. This could be any categorical attribute for which fairness is to be assessed (e.g., ‘gender’, ‘race’). -
privileged_value
(str
orint
): The value in the protected attribute column that represents the privileged group. For example, if the protected attribute is ‘gender’, the privileged value could be ‘male’. -
unprivileged_value
(str
orint
): The value in the protected attribute column that represents the unprivileged group. Following the earlier example, this could be ‘female’ if the protected attribute is ‘gender’. -
favorable_result
(int
): The value in the outcome column that represents a favorable result. Typically, this is 1 for a positive outcome and 0 for a negative outcome.
Returns
float
: The disparate impact ratio. A value of 0 implies perfect fairness, a negative value indicates a bias in favor of the unprivileged group, and a positive value indicates a bias against the unprivileged group.
Example
This example demonstrates how to use the calculate_disparate_impact function on a sample dataset. The dataset contains information on loan approvals, with ‘gender’ as the protected attribute.
>>> import pandas as pd
>>> from util import calculate_disparate_impact
# Example dataset
>>> data = {
'gender': ['male', 'female', 'male', 'female', 'male', 'female', 'male', 'female'],
'loan_approved': [1, 0, 1, 1, 1, 1, 1, 0]
}
>>> df = pd.DataFrame(data)
>>> outcome_column = 'loan_approved'
>>> protected_attribute = 'gender'
>>> privileged_value = 'male'
>>> unprivileged_value = 'female'
>>> disparate_impact = calculate_disparate_impact(df, outcome_column, \
protected_attribute, privileged_value, unprivileged_value, 1)
>>> print("Disparate Impact Ratio:", disparate_impact)
Disparate Impact Ratio: 0.5
The output indicates the degree of disparity in the loan approval process between males and females. A value close to 0 implies fairness, a negative value indicates bias in favor of the unprivileged group, and a positive value indicates bias against the unprivileged group.
With this output, we can see that there is a bias against the unprivileged group (in this case, ‘female’) in the context of loan approvals.