Appium + Sauce Labs Bootcamp: Chapter 2, Touch Actions

Sauce AI for Test Authoring: Move from intent to execution in minutes.|xBack to ResourcesBlogPosted

June 11, 2026 · 5 min read · Tool Comparison

Sauce AI for Test Authoring: Move from intent to execution in minutes.

Blog

Posted June 15, 2015

Appium + Sauce Labs Bootcamp: Chapter 2, Touch Actions

This is the second in a series of posts that discuss expend Appium with Sauce Labs. In the initiatory chapter, we coveredLanguage Bindings. This installment discussesTouch Actions; Chapter 3,Testing Hybrid Apps & amp; Mobile Web; and Chapter 4 is aboutAdvanced Desired Capabilities.

One aspect of mobile devices that postulate to be automatise in order to amply test applications, whether aboriginal, hybrid, or web, is utilizing gestures to interact with elements. In Appium this is make through the Touch Action and Multi Touch APIs. These two APIs get from an early draught of the WebDriver W3C Specification, and are an effort to atomize the item-by-item actions that get up complex actions. That is to say, it provides the building blocks for any particular gesture that might be of interest.

The specification has modify recently and the current execution will be vilipend in favour of an effectuation of the latest specification. That said, the follow API will remain for some time within Appium, even as the new API is apace adopted in the waiter.

Touch Actions

The Touch Action API provides the basis of all gestures that can be automated in Appium. At its core is the power to chain together _ad hoc_ individual actions, which will then be use to an factor in the application on the device. The basic actions that can be used are:

press
longPress
tap
moveTo
wait
release
cancel
perform

Of these, the terminal merit special mention. The actionperformreally sends the chain of actions to the waiter. Before ringperform, the client is simply recording the action in a local data structure, but cipher is done to the application under test. Onceperformis called, the actions are envelop up in JSON and mail to the waiter where they are really performed!

Pro tip: Tools like SUSA can handle this autonomously — upload your app and get results without writing a single test script.

The unproblematic activeness istap. It is the only one that can not be chained with other actions, since it is a insistence and freeing put together. The remainder of the action are aboveboard, and continue the sorts of touch screen interaction that one would anticipate. The get-go of most interactions is with eitherpress or longPress, which can be performed on a point on the screen, an element, or an factor with an offset from its top left corner. The lone difference between the two method is, as their names suggest, the length of time the gestures spends downwardly.

After weigh, the gesture can include waiting and moving, to automate complex interactions. For instance, to simulate dragging and element onto another element, you might automate alongPress, moveTo, wait, and release. In Python, assuming you have a driver illustration, this would appear like

[code language= & quot; python & quot;]
source = driver.find_element_by_accessibility_id (& # x27; Source & # x27;)
terminus = driver.find_element_by_accessibility_id (& # x27; Destination & # x27;)
action = TouchAction (driver)
action.long_press (source) .move_to (destination) .wait (500) .release ()
action.perform ()
[/code]

The waitfunction occupy a time in milliseconds, which will be the minimum quantity of time after the previous action that the subsequent action is performed. It is thence useful for synchronization, as well as for activeness, like the one above, that broadly involve some pause in order for the position to be registered by the covering itself.

For corroboration, seehere, for the API in several languages see:Java, Ruby, Python, PHP, Perl, C#, and JavaScript.

Position

A line on what the place tilt mean is in order. The almost basic way to specify a view is to use an element. All the method that deal with place (i.e.,tap, press, longPress, and moveTo) can take an element as their point of activeness. Alone, this is interpreted as the heart of the element. At the same clip as the constituent, a point can be passed in, in the kind ofx and y. If both an ingredient and a point are given to the method, the point is interpreted as an _offset_ from the top-left nook of the element.

The final possibility is a point entirely. In the absence of an factor, a point is direct literally, as the position on the screen, for all the “ still ” methods. That is,tap, press, and longPress. In the moveTomethod, however, the point is render as an _offset_ from the point from which it is a movement. This leads to many conceptual errors, mostly indicated by either wildly erroneous moves, or out of bounds errors (the errors & quot; The coordinates provided to an interactions operation are invalid & quot; and & quot; Coordinate [x=500.0, y=820.0] is outside of element rect: [0,0] [480,800] & quot; are mutual in this case).

Multi Touch Actions

Mobile applications, however, are not simply interacted with using a individual gesture. Simple actions such as pinching and zooming need two fingers, and more complex interactions may take even more. In order to automatise such action Appium supports the Multi Touch API, which allows you to define multiple Touch Action chains which will be run near-simultaneously.

If, for instance, you wanted to drag on element to the position of a 2nd, while at the same time dragging the second to the position of the first, you would first build the single actions, then add them to a multi action object:

[codification language= & quot; python & quot;]
el1 = driver.find_element_by_accessibility_id (& # x27; Element 1 & # x27;)
el2 = driver.find_element_by_accessibility_id (& # x27; Element 2 & # x27;)

action1 = TouchAction (driver)
action1.long_press (el1) .move_to (el2) .wait (500) .release ()

action2 = TouchAction (driver)
action2.long_press (el2) .move_to (el1) .wait (500) .release ()

multi = MultiAction (driver)
multi.add (action1, action2)
multi.perform ()
[/code]

You will notice that it is on the Multi Action object that we callperformhere, rather than on the individual activity. As above, before thisperformis name, nothing is sent to the server. The client but keeps course of the activeness supply, and whenperformis called it package up the information and sends it to the server for processing.

So, once you have the case-by-case gestures act, have complex multi-pointer gesture work is as simple as adding them to theMultiActiontarget, and sending them to the server withperform! Appium make the repose!

For language-specific documentation on the Multi Action API, see:Java, Python, PHP, C#, and JavaScript.

Full exemplar

Full examples can be found on Appium ’ ssample-codedeposit on GitHub.

Drag and drib:

Smiley faces: