Attack Interface
This page documents the attack interface for generating adversarial examples.
API Reference
AdversarialAttacks.AbstractAttack — Type
Abstract supertype for all Adversarial Attacks.
Expected interface (to be implemented per concrete attack):
name(::AbstractAttack)::Stringattack(atk::AbstractAttack, model, sample; detailed_result, kwargs...)returning an adversarial result
AdversarialAttacks.WhiteBoxAttack — Type
Abstract type for White-box Adversarial Attacks.
White-box attacks have full access to the model's internals, including gradients, weights, and architecture. This enables the use of gradient-based optimization and other techniques to craft adversarial examples.
Use this type when the attacker can inspect and manipulate the model's internal parameters and computations. If only input-output access is available, use BlackBoxAttack instead.
AdversarialAttacks.BlackBoxAttack — Type
Abstract type for Black-box Adversarial Attacks.
Black-box attacks only have access to the model's input-output behavior, without knowledge of the model's internals, gradients, or architecture. These attacks typically rely on query-based methods (e.g., optimization via repeated queries) or transferability from surrogate models.
Use BlackBoxAttack when you do not have access to the model's internal parameters or gradients, such as in deployed systems or APIs. In contrast, use WhiteBoxAttack when you have full access to the model's internals and can leverage gradient information for crafting adversarial examples.
AdversarialAttacks.name — Method
name(atk::AbstractAttack) -> StringHuman-readable name for an attack.
Returns
String: String representation of the attack type.
AdversarialAttacks.attack — Function
attack(atk::AbstractAttack, model, sample; detailed_result, kwargs...) -> adversarial_resultGenerate an adversarial example by applying the attack to a sample.
Arguments
atk::AbstractAttack: Attack configuration and algorithm.model: Target model to attack.sample: Input sample to perturb (e.g., image, text). Usually a NamedTuple(data,label).detailed_result::Bool=false:false(default): Returns adversarial example only (backward compatible)true: ReturnsAttackResultwith metrics.
kwargs...: Additional attack-specific parameters
Returns
detailed_result=false: adversarial example only.detailed_result=true: NamedTuple with fields:x_adv: Adversarial example.success: Attack succeeded.queries_used: Number of model queries.final_label: Final prediction.
attack(atk::BasicRandomSearch, model::Chain, sample; detailed_result)Perform a Black-box Adversarial Attack on the given model using the provided sample using the Basic Random Search variant SimBA.
Arguments
atk::BasicRandomSearch: An instance of the BasicRandomSearch (Black-box) attack.model::Chain: The machine learning (deep learning, classical machine learning) model to be attacked.sample: Input sample as a named tuple withdataandlabel.detailed_result::Bool=false: Return format control.false(default): Returns adversarial example only (Array).true: Returns NamedTuple with metrics (xadv, success, queriesused, final_label).
Returns
- If
detailed_result=false: Adversarial example (same type assample.data). - If
detailed_result=true: NamedTuple containing:x_adv: Adversarial example.success::Bool: Whether attack succeeded.queries_used::Int: Number of model queries.final_label::Int: Final predicted class.
attack(atk::BasicRandomSearch, model::DecisionTreeClassifier, sample; detailed_result)Perform a Black-box adversarial attack on a DecisionTreeClassifier using BasicRandomSearch (SimBA).
Arguments
atk::BasicRandomSearch: Attack instance withepsilonand optionalbounds.model::DecisionTreeClassifier: DecisionTree.jl classifier to attack.sample: NamedTuple withdataandlabelfields.detailed_result::Bool=false: Return format control.false(default): Returns adversarial example only (Array).true: Returns NamedTuple with metrics (xadv, success, queriesused, final_label).
Returns
- If
detailed_result=false: Adversarial example (same type assample.data). - If
detailed_result=true: NamedTuple containing:x_adv: Adversarial example.success::Bool: Whether attack succeeded.queries_used::Int: Number of model queries.final_label::Int: Final predicted class.
attack(atk::BasicRandomSearch, mach::Machine, sample)Black-box Adversarial Attack on an MLJ Machine (e.g. a RandomForestClassifier) using BasicRandomSearch (SimBA), via predict.
atk::BasicRandomSearch: Attack instance withepsilonandmax_iter.mach::Machine: Trained MLJ machine with probabilistic predictions.sample: NamedTuple withdata(feature vector) andlabel(true class index, 1-based).detailed_result::Bool=false: Return format control.false(default): Returns adversarial example only (Array).true: Returns NamedTuple with metrics (xadv, success, queriesused, final_label).
Returns
- If
detailed_result=false: Adversarial example (same type assample.data) - If
detailed_result=true: NamedTuple containing:x_adv: Adversarial example.success::Bool: Whether attack succeeded.queries_used::Int: Number of model queries.final_label::Int: Final predicted class.
attack(atk::FGSM, model, sample; loss, detailed_result)Perform a Fast Gradient Sign Method (FGSM) White-box Adversarial Attack on the given model using the provided sample.
Arguments
atk::FGSM: An instance of theFGSM.model::FluxModel: The machine learning (deep learning) model to be attacked.sample: Input sample as a named tuple withdataandlabel.loss: Loss function with signatureloss(model, x, y). Defaults todefault_loss, i.e. cross-entropy.detailed_result::Bool=false: Return format control.false(default): Returns adversarial example only (Array).true: Returns NamedTuple with metrics (xadv, success, queriesused, final_label).
Returns
- If
detailed_result=false:- Adversarial example (same type and shape as
sample.data).
- Adversarial example (same type and shape as
- If
detailed_result=true:- NamedTuple with fields:
x_adv: Adversarial example.queries_used::Int: Number of gradient evaluations (for FGSM == 1).
- NamedTuple with fields:
Type Hierarchy
using AdversarialAttacks
println("FGSM <: WhiteBoxAttack: ", FGSM <: WhiteBoxAttack)
println("WhiteBoxAttack <: AbstractAttack: ", WhiteBoxAttack <: AbstractAttack)
println("BasicRandomSearch <: BlackBoxAttack: ", BasicRandomSearch <: BlackBoxAttack)
println("BlackBoxAttack <: AbstractAttack: ", BlackBoxAttack <: AbstractAttack)FGSM <: WhiteBoxAttack: true
WhiteBoxAttack <: AbstractAttack: true
BasicRandomSearch <: BlackBoxAttack: true
BlackBoxAttack <: AbstractAttack: trueQuick Example
using AdversarialAttacks
using Flux
# Construct attacks via the high-level API types
fgsm = FGSM(epsilon = 0.01f0)
brs = BasicRandomSearch(epsilon = 0.1f0, max_iter = 10)
brs2 = BasicRandomSearch(epsilon = 0.1f0, bounds = [(0f0, 1f0)], max_iter = 10)
println("FGSM: ", name(fgsm))
println("BSR: ", name(brs))
# Run an attack
model = Chain(
Dense(2, 2, tanh),
Dense(2, 2),
softmax,
)
sample = (data=rand(Float32, 2, 1), label=Flux.onehot(1, 1:2))
adv_sample_fgsm = attack(fgsm, model, sample)
adv_sample_brs = attack(brs, model, sample)2×1 Matrix{Float32}:
0.41612267
0.65083104