Black-Box SimBA Attack on 2D Spirals
This tutorial demonstrates how to perform a black-box adversarial attack using SimBA (Simple Black-box Attack) on a 2D spiral classification problem. BasicRandomSearch is our implementation of the SimBA algorithm.
What you will learn:
- How to create and train a model on a 2D spiral dataset
- How to visualize decision boundaries and adversarial perturbations
- How to use BasicRandomSearch (SimBA) for black-box attacks
- How attack success varies with different epsilon values
Prerequisites
Make sure you have the following packages installed: Flux, AdversarialAttacks, Plots, Statistics, and LinearAlgebra.
using Random
using Flux
using AdversarialAttacks
using Plots
using Statistics
using LinearAlgebra: norm
Random.seed!(42)
println("=== SimBA Attack Demo ===\n")=== SimBA Attack Demo ===1. Create 2D spiral dataset
We generate a synthetic two-class dataset where each class forms a spiral pattern. This provides a challenging non-linear classification boundary that makes for interesting visualizations of adversarial perturbations.
function make_spirals(n_points = 100; noise = 0.3)
t = range(0, 4π, length = n_points)
# Class 1: spiral going one way
x1 = t .* cos.(t) .+ noise .* randn(n_points)
y1 = t .* sin.(t) .+ noise .* randn(n_points)
# Class 2: spiral going the other way
x2 = t .* cos.(t .+ π) .+ noise .* randn(n_points)
y2 = t .* sin.(t .+ π) .+ noise .* randn(n_points)
X = hcat(vcat(x1, x2), vcat(y1, y2))' # 2 x 2n
y = vcat(ones(Int, n_points), 2 * ones(Int, n_points))
# Normalize
X = (X .- mean(X, dims = 2)) ./ std(X, dims = 2)
return Float32.(X), y
end
println("Generating spiral dataset...")
X, y = make_spirals(150; noise = 0.4)(Float32[-0.021510817 0.04206322 … -2.3728077 -2.280824; 0.0125411535 0.022647996 … 0.1679076 -0.05869015], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])2. Train simple neural network
We train a small feedforward neural network (2→16→16→2) on the spiral data. The model learns to separate the two spirals using ReLU activations and cross-entropy loss.
function train_model(X, y; epochs = 500)
model = Chain(
Dense(2 => 16, relu),
Dense(16 => 16, relu),
Dense(16 => 2),
)
Y_onehot = Flux.onehotbatch(y, 1:2)
opt = Flux.setup(Adam(0.01), model)
for epoch in 1:epochs
loss, grads = Flux.withgradient(model) do m
Flux.logitcrossentropy(m(X), Y_onehot)
end
Flux.update!(opt, model, grads[1])
if epoch % 100 == 0
acc = mean(Flux.onecold(model(X)) .== y)
println("Epoch $epoch: loss = $(round(loss, digits = 4)), acc = $(round(acc, digits = 3))")
end
end
return model
end
println("\nTraining neural network...")
model = train_model(X, y; epochs = 500)
final_acc = mean(Flux.onecold(model(X)) .== y)
println("Final accuracy: $(round(100 * final_acc, digits = 1))%")
Training neural network...
Epoch 100: loss = 0.5588, acc = 0.627
Epoch 200: loss = 0.4764, acc = 0.783
Epoch 300: loss = 0.0963, acc = 0.987
Epoch 400: loss = 0.0414, acc = 0.99
Epoch 500: loss = 0.0312, acc = 0.99
Final accuracy: 99.0%3. Visualization helpers
These functions help us visualize the decision boundary and attack results. The plot_decision_boundary! function creates a contour plot showing which regions the model predicts as class 1 vs class 2.
function plot_decision_boundary!(plt, model; resolution = 100, alpha = 0.3)
xs = range(-3, 3, length = resolution)
ys = range(-3, 3, length = resolution)
Z = zeros(resolution, resolution)
for (i, x) in enumerate(xs), (j, y) in enumerate(ys)
pred = model(Float32[x, y])
Z[j, i] = sign(pred[1] - pred[2])
end
# Replace NaN values to prevent plotting errors
replace!(Z, NaN => 0.0)
return contourf!(
plt, xs, ys, Z, levels = [-1, 0, 1],
c = cgrad([:lightblue, :lightsalmon]), alpha = alpha,
linewidth = 0, colorbar = false
)
end
function plot_attack_results(X, y, model, atk; n_samples = 20)
indices = randperm(size(X, 2))[1:n_samples]
plt = plot(
size = (800, 700), title = "SimBA Attack (ε=$(atk.epsilon))",
xlabel = "x₁", ylabel = "x₂", legend = :topright
)
plot_decision_boundary!(plt, model)
successful_attacks = 0
total_perturbation = 0.0
# Track which legend entries we've added
shown_class1 = false
shown_class2 = false
shown_success = false
shown_fail = false
for idx in indices
x_orig = X[:, idx]
label_onehot = Flux.onehot(y[idx], 1:2)
sample = (data = x_orig, label = label_onehot)
x_adv = attack(atk, model, sample)
# Skip if attack produced NaN or Inf values
if any(isnan.(x_adv)) || any(isinf.(x_adv))
continue
end
pred_orig = argmax(model(x_orig))
pred_adv = argmax(model(x_adv))
success = pred_orig != pred_adv
if success
successful_attacks += 1
end
total_perturbation += norm(x_adv - x_orig)
# Plot original point
color = y[idx] == 1 ? :blue : :red
show_label = (y[idx] == 1 && !shown_class1) || (y[idx] == 2 && !shown_class2)
scatter!(
plt, [x_orig[1]], [x_orig[2]],
color = color, markersize = 8, markerstrokewidth = 2,
label = show_label ? "Original (class $(y[idx]))" : ""
)
if y[idx] == 1
shown_class1 = true
else
shown_class2 = true
end
# Plot adversarial point
if success
scatter!(
plt, [x_adv[1]], [x_adv[2]],
color = :black, marker = :x, markersize = 10, markerstrokewidth = 3,
label = !shown_success ? "Successful attack" : ""
)
shown_success = true
else
scatter!(
plt, [x_adv[1]], [x_adv[2]],
color = :gray, marker = :circle, markersize = 5, alpha = 0.5,
label = !shown_fail ? "Failed attack" : ""
)
shown_fail = true
end
# Draw arrow from original to adversarial
arrow_color = success ? :black : :gray
plot!(
plt, [x_orig[1], x_adv[1]], [x_orig[2], x_adv[2]],
color = arrow_color, alpha = 0.5, linewidth = 1, arrow = true, label = ""
)
end
attack_rate = round(100 * successful_attacks / n_samples, digits = 1)
avg_pert = round(total_perturbation / n_samples, digits = 3)
annotate!(plt, -2.8, 2.8, text("Attack success: $attack_rate%\nAvg perturbation: $avg_pert", 10, :left))
return plt
end
function compare_epsilons(X, y, model; epsilons = [0.02, 0.05, 0.1, 0.3], n_samples = 30)
plots = []
bounds = [(-3.5, 3.5), (-3.5, 3.5)]
for ε in epsilons
atk = BasicRandomSearch(epsilon = Float32(ε), max_iter = 20, bounds = bounds)
plt = plot_attack_results(X, y, model, atk; n_samples = n_samples)
title!(plt, "ε = $ε")
push!(plots, plt)
end
combined = plot(plots..., layout = (2, 2), size = (1200, 1100))
return combined
endcompare_epsilons (generic function with 1 method)4. Run attack and visualize
We run the BasicRandomSearch attack with ε=0.1 and visualize the results. The attack tries to find small perturbations that cause misclassification by randomly probing the input space. Original points are shown in their class color (blue/red), successful adversarial examples as black X markers, and failed attacks as gray circles.
println("\nRunning SimBA attack visualization...")
bounds = [(-3.5, 3.5), (-3.5, 3.5)] # Set bounds for normalized data
atk = BasicRandomSearch(epsilon = 0.1f0, max_iter = 20, bounds = bounds)
p1 = plot_attack_results(X, y, model, atk; n_samples = 25)
OUTPUTS_DIR = joinpath(@__DIR__, "outputs")
mkpath(OUTPUTS_DIR)5. Compare different epsilon values
We compare attack success rates across different perturbation budgets (ε). Larger ε values allow stronger perturbations, making attacks more likely to succeed but also more visible. The grid layout shows how attack effectiveness scales with perturbation budget.
println("\nComparing different epsilon values...")
p2 = compare_epsilons(X, y, model; epsilons = [0.02, 0.05, 0.1, 0.3], n_samples = 25)Common edits to try
- Change
epsilonvalues to see how perturbation budget affects attack success - Adjust
max_iterto give the attack more or fewer queries - Modify
noiseinmake_spirals()to change problem difficulty - Try different network architectures by changing the Dense layer sizes
- Change
n_samplesto attack more or fewer points