May 13, 2026 Amazon Web Services (AWS) is introducing a new way for AI agents to interact with software by giving them access to full virtual desktops in the cloud. The new preview feature allows developers to assign agents unique identities through AWS Identity and Access Management, letting them log into WorkSpaces virtual PCs and operate applications as if they were human users.
The setup effectively turns AI agents into autonomous desktop operators. Using pre-signed URLs tied to their credentials, agents can access a dedicated WorkSpace and interact with software through a managed interface that supports screenshots, mouse movements, and text input. AWS says giving each agent a distinct identity improves observability, making it easier to track actions and separate automated behavior from human activity.
This approach is designed for a specific gap in enterprise software: systems that lack APIs. While APIs remain the most efficient way for software to communicate, many legacy tools, proprietary platforms, and thick-client applications still require direct user interaction. By placing agents inside virtual desktops, AWS enables automation in environments where traditional integrations are not possible.
The underlying infrastructure is flexible. AWS WorkSpaces can be provisioned across a wide range of configurations – from lightweight instances with a single virtual CPU and 2GB of RAM to high-performance machines equipped with GPUs, 32 vCPUs, and up to 256GB of memory. These virtual PCs can run continuously under a flat monthly rate or spin up on demand with hourly billing, making them well-suited for short-lived, task-specific workloads.
That ephemeral model is central to the design. Organizations can launch a virtual desktop, allow an agent to complete a task, and then shut it down – minimizing exposure and cost. Running these environments inside a virtual private cloud also adds a layer of isolation, which may be preferable to deploying agents directly on internal networks or physical machines.
AWS is not alone in exploring this direction. Microsoft has also introduced agent-focused capabilities within its Windows 365 service, signaling a broader industry shift toward “computer-use agents” that operate graphical interfaces rather than structured APIs.
These agents rely heavily on computer vision. Instead of directly calling functions, they interpret screenshots or video feeds of a desktop environment, decide what action to take, and then simulate user inputs like clicking or typing. While powerful, this method introduces significant computational overhead.
Research from AI firm Reflex highlights the cost implications. In one benchmark, a browser-based vision agent required roughly 500,000 tokens just to interact with a dropdown menu—suggesting that such approaches can be up to 45 times more expensive than equivalent API-based interactions. The company argues that while improvements in model efficiency may reduce costs over time, agent-driven workflows will inherently involve more steps than direct integrations.
AWS acknowledges these limitations but frames them differently. According to an AWS spokesperson, benchmarks like Reflex’s represent narrow scenarios and don’t fully reflect how agents are deployed in real enterprise environments. The company emphasizes that APIs and agents serve different purposes: where APIs exist, agents should use them—but where they don’t, desktop-level automation becomes necessary.
