Statistical Parity Difference
Calculates the statistical parity difference for a single protected attribute comparing privileged and unprivileged values.
def calculate_statistical_parity_difference(df, outcome_column, protected_attribute,
privileged_value, unprivileged_value, favorable_result)
Parameters
-
df
(pd.DataFrame
): The DataFrame containing the dataset. This should include the outcome column and the protected attribute. -
outcome_column
(str
): The name of the column in the DataFrame that represents the outcome of the prediction. The outcome should be binary, typically encoded as 1 for a favorable result and 0 for an unfavorable result. -
protected_attribute
(str
): The name of the column in the DataFrame that represents the protected attribute. This attribute is used to differentiate between the privileged and unprivileged groups (e.g., ‘gender’, ‘race’). -
privileged_value
(str
orint
): The value within the protected attribute that represents the privileged group. For instance, if ‘gender’ is the protected attribute, ‘male’ might be designated as the privileged value. -
unprivileged_value
(str
orint
): The value within the protected attribute that represents the unprivileged group. Using the same example, if ‘gender’ is the protected attribute, ‘female’ might be the unprivileged value. -
favorable_result
(int
): The value in the outcome column that is considered a favorable result. Typically, this is 1 for a positive outcome.
Returns
float
: The statistical parity difference. A value of 0 implies fairness, a negative value indicates a bias against the unprivileged group, and a positive value indicates a bias in favor of the unprivileged group.
Example
This example demonstrates how to use the calculate_statistical_parity_difference function on a sample dataset. The dataset contains information on loan approvals, with ‘gender’ as the protected attribute.
>>> import pandas as pd
>>> from util import calculate_statistical_parity_difference
# Example dataset
>>> data = {
'gender': ['male', 'female', 'male', 'female', 'male', 'female', 'male', 'female'],
'loan_approved': [1, 0, 1, 1, 1, 1, 1, 0]
}
>>> df = pd.DataFrame(data)
>>> outcome_column = 'loan_approved'
>>> protected_attribute = 'gender'
>>> privileged_value = 'male'
>>> unprivileged_value = 'female'
>>> stat_parity_diff = calculate_statistical_parity_difference(df, outcome_column, \
protected_attribute, privileged_value, unprivileged_value, 1)
>>> print("Statistical Parity Difference:", stat_parity_diff)
Statistical Parity Difference: -0.5
The output indicates the degree of statistical parity difference in the loan approval process between males and females. A value close to 0 implies fairness, a negative value indicates bias in against the unprivileged group, and a positive value indicates bias in favor of the unprivileged group.
This result indicates a bias against the unprivileged group (in this case, ‘female’) in the context of loan approvals.