Attack Interface

This page documents the attack interface for generating adversarial examples.

API Reference

AdversarialAttacks.AbstractAttackType

Abstract supertype for all Adversarial Attacks.

Expected interface (to be implemented per concrete attack):

  • name(::AbstractAttack)::String
  • attack(atk::AbstractAttack, model, sample; detailed_result, kwargs...) returning an adversarial result
source
AdversarialAttacks.WhiteBoxAttackType

Abstract type for White-box Adversarial Attacks.

White-box attacks have full access to the model's internals, including gradients, weights, and architecture. This enables the use of gradient-based optimization and other techniques to craft adversarial examples.

Use this type when the attacker can inspect and manipulate the model's internal parameters and computations. If only input-output access is available, use BlackBoxAttack instead.

source
AdversarialAttacks.BlackBoxAttackType

Abstract type for Black-box Adversarial Attacks.

Black-box attacks only have access to the model's input-output behavior, without knowledge of the model's internals, gradients, or architecture. These attacks typically rely on query-based methods (e.g., optimization via repeated queries) or transferability from surrogate models.

Use BlackBoxAttack when you do not have access to the model's internal parameters or gradients, such as in deployed systems or APIs. In contrast, use WhiteBoxAttack when you have full access to the model's internals and can leverage gradient information for crafting adversarial examples.

source
AdversarialAttacks.nameMethod
name(atk::AbstractAttack) -> String

Human-readable name for an attack.

Returns

  • String: String representation of the attack type.
source
AdversarialAttacks.attackFunction
attack(atk::AbstractAttack, model, sample; detailed_result, kwargs...) -> adversarial_result

Generate an adversarial example by applying the attack to a sample.

Arguments

  • atk::AbstractAttack: Attack configuration and algorithm.
  • model: Target model to attack.
  • sample: Input sample to perturb (e.g., image, text). Usually a NamedTuple(data,label).
  • detailed_result::Bool=false:
    • false (default): Returns adversarial example only (backward compatible)
    • true: Returns AttackResult with metrics.
  • kwargs...: Additional attack-specific parameters

Returns

  • detailed_result=false: adversarial example only.
  • detailed_result=true: NamedTuple with fields:
    • x_adv: Adversarial example.
    • success: Attack succeeded.
    • queries_used: Number of model queries.
    • final_label: Final prediction.
source
attack(atk::BasicRandomSearch, model::Chain, sample; detailed_result)

Perform a Black-box Adversarial Attack on the given model using the provided sample using the Basic Random Search variant SimBA.

Arguments

  • atk::BasicRandomSearch: An instance of the BasicRandomSearch (Black-box) attack.
  • model::Chain: The machine learning (deep learning, classical machine learning) model to be attacked.
  • sample: Input sample as a named tuple with data and label.
  • detailed_result::Bool=false: Return format control.
    • false (default): Returns adversarial example only (Array).
    • true: Returns NamedTuple with metrics (xadv, success, queriesused, final_label).

Returns

  • If detailed_result=false: Adversarial example (same type as sample.data).
  • If detailed_result=true: NamedTuple containing:
    • x_adv: Adversarial example.
    • success::Bool: Whether attack succeeded.
    • queries_used::Int: Number of model queries.
    • final_label::Int: Final predicted class.
source
attack(atk::BasicRandomSearch, model::DecisionTreeClassifier, sample; detailed_result)

Perform a Black-box adversarial attack on a DecisionTreeClassifier using BasicRandomSearch (SimBA).

Arguments

  • atk::BasicRandomSearch: Attack instance with epsilon and optional bounds.
  • model::DecisionTreeClassifier: DecisionTree.jl classifier to attack.
  • sample: NamedTuple with data and label fields.
  • detailed_result::Bool=false: Return format control.
    • false (default): Returns adversarial example only (Array).
    • true: Returns NamedTuple with metrics (xadv, success, queriesused, final_label).

Returns

  • If detailed_result=false: Adversarial example (same type as sample.data).
  • If detailed_result=true: NamedTuple containing:
    • x_adv: Adversarial example.
    • success::Bool: Whether attack succeeded.
    • queries_used::Int: Number of model queries.
    • final_label::Int: Final predicted class.
source
attack(atk::BasicRandomSearch, mach::Machine, sample)

Black-box Adversarial Attack on an MLJ Machine (e.g. a RandomForestClassifier) using BasicRandomSearch (SimBA), via predict.

  • atk::BasicRandomSearch: Attack instance with epsilon and max_iter.
  • mach::Machine: Trained MLJ machine with probabilistic predictions.
  • sample: NamedTuple with data (feature vector) and label (true class index, 1-based).
  • detailed_result::Bool=false: Return format control.
    • false (default): Returns adversarial example only (Array).
    • true: Returns NamedTuple with metrics (xadv, success, queriesused, final_label).

Returns

  • If detailed_result=false: Adversarial example (same type as sample.data)
  • If detailed_result=true: NamedTuple containing:
    • x_adv: Adversarial example.
    • success::Bool: Whether attack succeeded.
    • queries_used::Int: Number of model queries.
    • final_label::Int: Final predicted class.
source
attack(atk::FGSM, model, sample; loss, detailed_result)

Perform a Fast Gradient Sign Method (FGSM) White-box Adversarial Attack on the given model using the provided sample.

Arguments

  • atk::FGSM: An instance of the FGSM.
  • model::FluxModel: The machine learning (deep learning) model to be attacked.
  • sample: Input sample as a named tuple with data and label.
  • loss: Loss function with signature loss(model, x, y). Defaults to default_loss, i.e. cross-entropy.
  • detailed_result::Bool=false: Return format control.
    • false (default): Returns adversarial example only (Array).
    • true: Returns NamedTuple with metrics (xadv, success, queriesused, final_label).

Returns

  • If detailed_result=false:
    • Adversarial example (same type and shape as sample.data).
  • If detailed_result=true:
    • NamedTuple with fields:
      • x_adv: Adversarial example.
      • queries_used::Int: Number of gradient evaluations (for FGSM == 1).
source

Type Hierarchy

using AdversarialAttacks

println("FGSM <: WhiteBoxAttack:       ", FGSM <: WhiteBoxAttack)
println("WhiteBoxAttack <: AbstractAttack: ", WhiteBoxAttack <: AbstractAttack)
println("BasicRandomSearch <: BlackBoxAttack: ", BasicRandomSearch <: BlackBoxAttack)
println("BlackBoxAttack <: AbstractAttack:    ", BlackBoxAttack <: AbstractAttack)
FGSM <: WhiteBoxAttack:       true
WhiteBoxAttack <: AbstractAttack: true
BasicRandomSearch <: BlackBoxAttack: true
BlackBoxAttack <: AbstractAttack:    true

Quick Example

using AdversarialAttacks
using Flux

# Construct attacks via the high-level API types
fgsm = FGSM(epsilon = 0.01f0)
brs = BasicRandomSearch(epsilon = 0.1f0, max_iter = 10)
brs2 = BasicRandomSearch(epsilon = 0.1f0, bounds = [(0f0, 1f0)], max_iter = 10)

println("FGSM: ", name(fgsm))
println("BSR:  ", name(brs))

# Run an attack
model = Chain(
           Dense(2, 2, tanh),
           Dense(2, 2),
           softmax,
       )
sample = (data=rand(Float32, 2, 1), label=Flux.onehot(1, 1:2))

adv_sample_fgsm = attack(fgsm, model, sample)
adv_sample_brs = attack(brs, model, sample)
2×1 Matrix{Float32}:
 0.41612267
 0.65083104