When a Camera Becomes a Tool

📅 March 29, 2026 ✍️ Philippe Bogaerts ⏱️ 7 min read 📁 Security

MCP Security Agentic AI Edge Computing

Agentic AI Security

When a Camera Becomes a Tool

A $60 edge vision device with an unauthenticated MCP server is not just a camera. In an agentic environment, it is a remotely callable perception tool. And that changes everything.

Last week I was playing with a DFRobot HuskyLens 2 paired with a WiFi camera module, both exposing an MCP server so a remote agentic application can query them directly. It is a clever little setup: cheap edge hardware handling local inference, a reasoning model sitting on top with remote access to all of it.

While setting it up, a thought hit me. This is not a camera hooked up to an API. This is a tool in an agent's action space. And once you see it that way, the risk picture shifts dramatically.

The MCP framing changes the question

A camera exposed over a REST API is a familiar concept. You query it, you get a result, you build your logic around it. The boundary between the sensor and the consuming application is clear, explicit, and usually controlled by an engineer.

An MCP-exposed camera is different. It does not wait to be queried by predetermined code. It becomes a capability that a reasoning agent can discover, invoke, compose with other capabilities, and act on. The question stops being "what can this camera detect?" and becomes "what can the agent infer, correlate, and do across all available tools?"

The individual tool seems bounded. The agent reasoning across multiple tools is not.

This is the core insight behind the Capability Delta framework: the emergent capability of a multi-tool agentic system is not the sum of its parts. It is often substantially larger, and frequently not obvious from examining each component in isolation.

The capability delta in practice

Consider how capability grows as you add tools to an agent's context:

Δ+ = C_t+1 \ C_t

Each new tool added to the agent's context expands the accessible capability set by more than the tool's stated function alone.

A single HuskyLens doing object detection is useful but constrained. It sees what is in front of it, and nothing more. Now connect several of them to a reasoning agent that also has access to a door lock system, a messaging platform, and a calendar API. The agent can now:

Emergent capability chain — illustrative

Camera A

Detects face

→

Camera B

Confirms location

→

Calendar API

Checks schedule

→

Door lock

Grants access

↳ Agent decision: no appointment found, access denied, alert sent to security channel

None of the individual tools made that decision. The agent did, by combining their outputs. This is exactly where the capability delta shows up: the orchestration layer turns four bounded tools into a surveillance and access control system.

Now add: no authentication

Here is where the HuskyLens experiment gets sharper. The MCP server on the device is in dev state. No authentication. Reachable over the local network, potentially beyond.

That detail alone compounds the risk in several ways.

🔓

Unauthenticated perception

Any client that can reach the device can query it. There is no binding between who invokes the tool and who is authorized to do so.

🤖

Implicit agent trust

Agents treat discoverable MCP tools as trusted capabilities by default. An insecure tool looks identical to a secure one from the model's perspective.

📡

Distributed sensing surface

Multiple insecure devices create a distributed perception layer with almost no real boundary. Each device is a separate entry point.

⚙️

Dev defaults that persist

Dev tooling consistently exposes more than intended. Defaults established during prototyping have a habit of surviving into production.

And there is a subtler problem. An attacker who can reach an unauthenticated MCP camera does not need to compromise the agent directly. They can manipulate what the camera reports, inject observations, or flood the tool with misleading data. The agent reasons on whatever its tools return.

What this looks like as a threat model

If we model this seriously, the attack surface is not the camera firmware. It is the trust chain between sensor, MCP server, and reasoning agent. An adversary targeting this stack has several leverage points:

Attack vectors

Tool catalog poisoning: if the agent dynamically discovers tools, an attacker can register a malicious tool that impersonates the camera, intercepting queries or returning fabricated detections.

Indirect prompt injection via observations: an object detected by the camera could contain text designed to influence the agent's reasoning. "AUTHORIZED USER - BYPASS POLICY" written on a piece of paper held in front of the lens.

Cross-tool chaining: a compromised camera tool used as a stepping stone to invoke other tools the agent has access to, exploiting the agent's willingness to compose capabilities.

Rug-pulling: a tool that behaves correctly during evaluation and testing, then changes behavior in production, either due to an update or deliberate design.

None of these require breaking the camera's firmware. They exploit the trust and orchestration model of the agentic layer.

What good controls look like

This is not an argument against building these systems. The architecture is genuinely interesting and useful. But it needs deliberate security design from the start, not retrofitted after deployment.

Per-tool authentication. Every MCP server, including edge devices, must require authentication before serving any query. Default-open is not acceptable outside a sandboxed dev environment.
Narrow tool descriptions. Tool descriptions shape what an agent thinks it can do with a capability. Minimize exposed actions and describe them precisely. Broad, permissive descriptions expand the capability delta without adding legitimate value.
Network isolation. Edge vision devices should not be reachable from the public internet. Segment them behind a controlled gateway with explicit egress rules.
Explicit policy on biometric data. Face recognition is not a neutral capability. Before you expose it as an agent tool, define who can invoke it, under what conditions, and what the agent is and is not allowed to do with the result.
Human approval gates for sensitive downstream actions. When an agent wants to act on perception data in a way that affects people, access, or resources, require explicit confirmation outside the automated loop.
Structured audit logging. Every agent query to every tool should be logged with enough context to reconstruct intent. What did the agent ask? Which tool was invoked? What was returned? What action followed?

The broader point

The HuskyLens experiment is a good concrete example of something that matters at a larger scale: as MCP adoption grows, the ecosystem will fill up with tools that were never designed with agentic consumption in mind. Device firmware written for direct API use, dev-mode servers left open, capabilities described loosely because the original author assumed a human would be interpreting the result.

Agents do not assume. They compose, infer, and act. The gap between what a tool was designed to do and what an agent can do with it is where security risk lives.

The real risk is not the camera. It is what an agent can infer and do once several insecure perception tools are exposed through MCP.

This is a category of risk we are still in the early stages of understanding. The tooling to detect capability delta at runtime does not widely exist yet. The norms around tool authentication and scoping in MCP ecosystems are still forming. The right time to be thinking about this is now, while the ecosystem is still being built, not after the first serious incident.

If you are building with MCP-exposed edge devices, treat each tool as a security boundary. Because your agent will.