Robustness Evaluation Suite

This page documents Robustness Evaluation Suite.

AdversarialAttacks.RobustnessReportType
RobustnessReport

Report on model robustness against an Adversarial Attack. Printing a RobustnessReport (via println(report)) displays a nicely formatted summary including clean/adversarial accuracy, attack success rate, and robustness score.

Fields

  • num_samples::Int: Total samples evaluated
  • num_clean_correct::Int: Samples correctly classified before attack
  • clean_accuracy::Float64: Accuracy on clean samples
  • adv_accuracy::Float64: Accuracy on adversarial samples
  • attack_success_rate::Float64: (ASR) Fraction of successful attacks (on correctly classified samples)
  • robustness_score::Float64: 1.0 - attacksuccessrate (ASR)
  • num_successful_attacks::Int: Number of successful attacks
  • linf_norm_max::Float64: Maximum L∞ norm of perturbations across all samples
  • linf_norm_mean::Float64: Mean L∞ norm of perturbations across all samples
  • l2_norm_max::Float64: Maximum L2 norm of perturbations across all samples
  • l2_norm_mean::Float64: Mean L2 norm of perturbations across all samples
  • l1_norm_max::Float64: Maximum L1 norm of perturbations across all samples
  • l1_norm_mean::Float64: Mean L1 norm of perturbations across all samples
  • mean_queries_all::Union{Float64, Missing}: Mean number of queries across all samples (if applicable)
  • mean_queries_success::Union{Float64, Missing}: Mean number of queries for successful attacks (if applicable)

Note

An attack succeeds when the clean prediction is correct but the adversarial prediction is incorrect.

  • The L∞ norm measures the maximum absolute change in any feature of the input.
  • The L2 norm measures the Euclidean distance between original and adversarial samples.
  • The L1 norm measures the Manhattan distance (sum of absolute differences).
source
AdversarialAttacks.calculate_metricsMethod
calculate_metrics(n_test, num_clean_correct, num_adv_correct,
                  num_successful_attacks, l_norms, queries_all, queries_success)

Compute accuracy, attack success, robustness, and perturbation norm statistics for adversarial evaluation.

Arguments

  • n_test: Number of test samples
  • num_clean_correct: Number of correctly classified clean samples
  • num_adv_correct: Number of correctly classified adversarial samples
  • num_successful_attacks: Number of successful adversarial attacks
  • l_norms: Dictionary containing perturbation norm arrays with keys :linf, :l2, and :l1
  • queries_all: Array of query counts for all samples
  • queries_success: Array of query counts for successful attacks

Returns

  • A RobustnessReport containing accuracy, robustness, and norm summary metrics (maximum and mean) for all three norm types.
source
AdversarialAttacks.compute_normMethod
compute_norm(sample_data, adv_data, p::Real)

Compute the Lp norm of the perturbation between original data and adversarial data.

This function uses LinearAlgebra.norm for optimal performance and numerical stability.

Arguments

  • sample_data: Original sample data.
  • adv_data: Adversarially perturbed version of sample_data.
  • p::Real: Order of the norm. Must be positive or Inf.
    • Common values: 1 (Manhattan/L1), 2 (Euclidean/L2), Inf (maximum/L∞).

Returns

  • Float64: The Lp norm of the perturbation ||adv_data - sample_data||_p.

Examples

original = [1.0, 2.0, 3.0]
adversarial = [1.5, 2.5, 3.5]

compute_norm(original, adversarial, 2)    # L2 (Euclidean) norm
compute_norm(original, adversarial, 1)    # L1 (Manhattan) norm
compute_norm(original, adversarial, Inf)  # L∞ (maximum) norm

References

  • Lp space: https://en.wikipedia.org/wiki/Lp_space
source
AdversarialAttacks.evaluate_robustnessMethod
evaluate_robustness(model, atk, test_data; num_samples=100)

Evaluate model robustness by running attack on multiple samples.

For each sample, computes clean and adversarial predictions, tracks attack success, and calculates perturbation norms (L∞, L2, and L1).

Arguments

  • model: The model to evaluate.
  • atk: The attack to use.
  • test_data: Collection of test samples.
  • num_samples::Int=100: Number of samples to test. If more than available samples, uses all available samples.
  • detailed_result::Bool=false: Whether to return detailed attack results (including queries used) or just the adversarial example.

Returns

  • RobustnessReport: Report containing accuracy, attack success rate, robustness metrics,

and perturbation statistics for L∞, L2, and L1 norms, as well as query statistics if applicable.

Example

report = evaluate_robustness(model, FGSM(epsilon=0.1), test_data, num_samples=50)
println(report)
source
AdversarialAttacks.evaluation_curveMethod
evaluation_curve(model, atk_type, epsilons, test_data; num_samples=100)

Evaluate model robustness across a range of attack strengths.

For each value in epsilons, an attack of type atk_type is instantiated and used to compute clean accuracy, adversarial accuracy, attack success rate, robustness score, and perturbation norms (L∞, L2, and L1).

Arguments

  • model: Model to be evaluated.
  • atk_type: Adversarial Attack type.
  • epsilons: Vector of attack strengths.
  • test_data: Test dataset.

Keyword Arguments

  • num_samples::Int=100: Number of samples used for each epsilon evaluation.

Returns

  • A dictionary containing evaluation metrics for each epsilon value:
    • :epsilons: Attack strength values.
    • :clean_accuracy: Clean accuracy for each epsilon.
    • :adv_accuracy: Adversarial accuracy for each epsilon.
    • :attack_success_rate: Attack success rate for each epsilon.
    • :robustness_score: Robustness score (1 - ASR) for each epsilon.
    • :linf_norm_mean, :linf_norm_max: L∞ norm statistics.
    • :l2_norm_mean, :l2_norm_max: L2 norm statistics.
    • :l1_norm_mean, :l1_norm_max: L1 norm statistics.

Example

results = evaluation_curve(model, FGSM, [0.01, 0.05, 0.1], test_data, num_samples=100)
println("Attack success rates: ", results[:attack_success_rate])
source
using AdversarialAttacks
println("RobustnessReport fields: ", fieldnames(RobustnessReport))
RobustnessReport fields: (:num_samples, :num_clean_correct, :clean_accuracy, :adv_accuracy, :attack_success_rate, :robustness_score, :num_successful_attacks, :linf_norm_max, :linf_norm_mean, :l2_norm_max, :l2_norm_mean, :l1_norm_max, :l1_norm_mean, :mean_queries_all, :mean_queries_success)