A/B Test Insights

The A/B tests were carried out digitally to stay consistent with the medium of documentation that has been used in the previous two user tests. For this user test, the method was to showcase both sets of gestures patterns one after another (alternating the order from user to user) to find out which set they prefer to use, and why. In addition to that, to observe subtle nuances in behaviour that’s valuable to my research, since each person can interpret the provided diagrams and instructions differently.

Questions (to users):
After each set
How did you feel about these gestures?
Which specific gesture felt most comfortable to perform?

After both sets
How did the gesture sets feel compared to one another?
Which one felt more comfortable to perform?
Which one do you prefer? Why?

All the interview questions garnered many responses, so far (as of 27th of November), I have tested on 12 users with the majority of users being between ages 20 and 29, and the dominant winner is gesture set A (point-pinch set), after interviewing a few people, I noticed most people have similar opinions but rephrased in many different ways.

Keywords for gesture set A that are positive being “natural”, “delicate”, “small”, “precise”, “in control”, “similar to how interactions with mobile devices are made”, and so on. Negative keywords regarding this gesture set includes “unintuitive”, “too similar between gestures”, and “takes learning”.

Important insights include some users specifically mentioning that these gestures feel like the best candidate for interactions with public screens due to the fact that most public screens people are familiar with don’t exceed the sizes of a laptop screen, with exceptions of digital information kiosks and fast food ordering machines. But they followed up by saying that these gestures are very precise and would be better candidates for even slightly larger screens because most of these slightly larger screens require more browsing, smaller movements, and longer interactions, in this case a more handy gesture that mimics a process of action makes more sense.

What I’ve noticed as an additional insight is that nobody really mentioned the ergonomics or the detection aspect of the gestures, as opposed to the other set of gestures, the possibility being that because there were no issues the users didn’t feel that they were worth mentioning until specifically asked, but this is just an assumption.

Keywords for gesture set B that were positive included “obvious”, “big”, “clear”, “smooth” and “visible”, while there were many negative keywords including “too big”, “unnatural”, “covering”, “complicated”, “tiring”, “takes effort”, “arm-tensing”, “unfamiliar”, “exhausting”, and “frustrating”. What is interesting here is that users mentioned more about detection and ergonomics, which I believe is due to the fact they they sensed issues.

Though some users believe that a good point of these gestures would be how “visible” they are to the camera, which is a plus, one user believed that it probably isn’t as specific to get the camera to pick up an entire palm, because he thought it was a point generated based on the location of all the fingers, so he thought the detection for one finger would be more feasible, which is the truth. There was a lot of dissonance regarding the ergonomics of these gestures, most of it being the users complaining that the gestures being too exhausting to perform because it requires too big of movements, and the grabbing gesture tenses the forearm in an uncomfortable matter.

Lastly, relating to the insight I mentioned above regarding smaller gestures being more compatible with smaller screens that require longer processes, users mentioned this set of gestures being more suitable for brief interactions due to the fact that the type of gestures involved being too straining for the arm so the hand doesn’t have to be held out for too long of a period.

Below is a quick summary of keywords rated either positive or negative under both the A and the B variant of gestures. There is no specific reason behind the placement of the text within the boxes.

(P.S. the photos can be clicked on and then zoomed in for a closer look, the text is quite small)

A/B Testing

The next step after evaluation is to narrow down the final two sets that were narrowed from from the bigger collection of gestures is to A/B test. For the A/B test, I figured that instead of having two user groups and have one group interact with prototypes using both the A and the B variant that we evaluated and finalised from the mountain of gestures that we collected from user test no.1 & no.2.

Below are gifs that demonstrate the flow of the two sets of gesture patterns. The browse isn’t illustrated properly but the same gestures are involved, except it involves a dragging motion to “scroll” before the fist or pinch is let go (start & finish states).

A: Point to Pinch to Point-pinch-drag-point

B: Palm to Grab to Palm-grab-drag-palm

About A/B Testing

What is it? A/B testing helps observe how one version of something performs against another. It is testing multiple versions of a design to determine which version generates the most conversions. What’s the user’s preference? Which one performs better?

Why? A/B testing eliminates guesswork and avoids assumptions, it validates or invalidates hypotheses, and helps us understand the features and elements that best improve user experience.

How? 1. Research 2. Formulate Hypothesis 3. Create variation 4. Run!

Formulate a Hypothesis by predicting what do you expect the result to be.
Observe which version is performed more intuitively Ask for feedback like why the users prefer A over B or vice versa.
Give the A/B test enough time to produce useful data.

Questions (to myself): What is my hypothesis? What do I expect the results to be? How large is the sample size? Which metrics am I measuring? Should I tweak the prototype before testing?


Questions (to users):
After each set
How did you feel about these gestures?
Which specific gesture felt most comfortable to perform?

After both sets
How did the gesture sets feel compared to one another?
Which one felt more comfortable to perform?
Which one do you prefer? Why?

Evaluation: The Final Two

After evaluating all the 11 gestures that were used in the four gestures combinations narrowed down from user test no.1 and 2, a specific few stood out from the crowd and we placed them into “gesture heaven”, the favoured gesture of them all, both by the users that participated in the user test, and by myself and Thomas, after having looked at all the gestures in a holistic manner.

From these selected few gestures, we managed to piece together two final contenders for the final verdict, while keeping them as different from each other as possible.

Once these two were selected, I took some photos from the front to imitate what the computer vision would see if the context was interacting with the public screen. This helped me gain a deeper understanding of why detection would work for some gestures but others not. I did conclude that the A/B options above both are high contenders because even though the hands can potentially get cut off when moved out of the view of the camera, these gestures all have clear points for the software to quickly pin-point. Furthermore the two browsing gestures both have start-and-finish states, meaning it would be easy on the software detection part, and false positives can be easier avoided. (eg. if we brose/scroll with the two finger gesture, once you scroll up— and point two fingers up, you need to beware and not point the two finger gesture back in the view of the camera or you may accidentally scroll the window down again).

Evaluation of Selected Gestures

To further narrow down the four selected gesture combinations, Thomas and I joined together and had two separate workshops where we tried out all the different combinations while rating the different gestures in the combinations with a LO, M, or H rating.

The gestures were rated under five different criteria that we believe were relevant in the context of implementing gesture control into public screens. They were False Positives, Ergonomics, Feasibility (for the users), “Naturalness”, and Detection.

False Positives:

false positive is an error in binary classification in which a result incorrectly indicates the presence of a condition. What we mean here is when there’s a high- or low- potential for accidentally performing an action that you were not intending on doing. (e.g selecting when you wanted to scroll)

Ergonomics:

Human factors and ergonomics is the application of psychological and physiological principles to the engineering and design of products, processes, and systems. Here we rank how comfortable the gesture is to perform so users won’t be stressed by an awkward posture where your musculoskeletal system uncomfortably positioned. This contributes as a factor of consideration because the more uncomfortable and physically straining it is to perform a task, the less inviting it would be.

Feasibility (for the users):

Feasibility by broad definition is the possibility that can be made, done, achieved, or is reasonable. In general, we can take feasibility as a broad term to determine whether a design or service is implementable, and many other characteristics fall into the genre of feasibility. It’s important to assess the feasibility of a design to stay grounded and not unachievable goals. For this criteria, we take feasibility into consideration for the user. Do we think a big group of users can easily adapt to this gesture? Is it familiar to them? Or will many breakdowns occur?

“Naturalness”:

A design should always feel intuitive and “natural” to use or be performed. “Naturalness” was a criteria we assessed because we wanted to compare and determine which gestures feel the most natural to perform in the contexts we have set up.

Detection:

Finally, detection by the software is a strong criteria we had to consider because it determines the general feasibility and the implementability of the gesture design. It’s easy to come up with many outrageous ideas with hand gestures, but can they all be picked up by the software?

Results:

Narrowing down

After having visualised all the gestures that were used in front of me, Thomas and joined together to quickly break down the gestures based on general feasibility into four different sets of gestures which each set including an aiming gesture, one or two selecting gestures, and one or two browsing gestures.

We chose one open hand aiming gesture paired with others that made sense, one finger pointing aiming gesture paid with, again, selecting and browsing gestures that “made sense”, one random pairing (D), and also one that was “my personal favourite”.

Narrowing down further? Some Lo-fi prototype/testing. See next post.

User test no.2

First round of user tests done (the one with the more realistic prototypes – see below), but it feels like my findings were somewhat abstract. It however helped me generate good insights for which interactions are the common denominator of any public screen UI. It includes (initiation)- can be passive hence the parentheses, navigating, browsing, and selecting.

After a small consultation session with Thomas, I decided it was time to step back and identify “common UIs”, or typical user interfaces you visit on a day-to-day basis, and set up very simple lo-fi interfaces to test with different people. Without all the additional distractions on the screen (like from the first round of user tests) it might be easier to concretize which gestures are favored / used based on natural instincts by the users. In order to identify these patterns in people’s behavior, I created very simple UIs inspired by apps and websites everybody visits often by I doing some quick research and mapping some screenshots out based on statistics I found on google, then comparing them a bit in order to identify patterns between genres of applications and their UI patterns. With these common UI patterns identified, I designed UIs with elements you tend to find in everyday UIs, for example, a generic menu with 9 items that can be selected, a scrollable vertical menu with smaller, boxed, modals, horizontal scroll menus, UI’s with more than one section of menus, carousel menus, and UIs with a chain of text involved.

The specific patterns of behavior I’m trying to identify are four different main interactions. 1. Initiation, 2. Browsing, 3. Navigating, and 4. Selecting.

INITIATION

Initiation or activation of a screen is crucial because the computer cannot be constantly turned on, it saves electricity for example if we don’t want the screen to be on and running the entire time. Some sort of gesture to activate the screen is desired. To test initiation, I created a blank, black canvas, while narrating during the tests that this is a black screen because it hasn’t been activated yet, “how would you activate this screen if you are not allowed to use your voice to control it, or use capacitative touch to interact with the screen?”

BROWSING

For browsing, it was important to see how people would browse through a menu with multiple items, the key gestures I wanted to identify here are how people scroll a menu, especially when menus are oriented differently, for example, will gestures differ when interacting with a horizontal vs. a vertical menu? Also, do gestures differ when scroll bars or buttons are added? Furthermore, when it comes to browsing, I wanted to identify if people keep their hands up in frame just like when you were to control a mouse, you do not let go of the mouse, but would they put their hands out of the frame when resting?

NAVIGATING

In terms of navigation, I mainly wanted to identify how people would point towards certain items in a menu, for example, do people use the cursor? Or do people simply click like with capacitative touchscreens, directly at the item without navigating towards it? I also wanted to see how people “go back” to the previous page or minimize modals, do they use gestures that signify “back” or do they click the arrow button I present in some interfaces and not the others?

SELECTING

Finally, in most cases, the menus afford being selected. During the user tests, I state for the user to select a certain item (highlighted in yellow), after seeing how they browse through the interface, navigating towards the requested item, I wanted to see how they would select or “enter” the item. Selecting is possible for menu items, back arrow buttons, scroll bars (on drag), and more.

Gesture highlights picked up from testing

This is a presentation of all the key gestures I’ve collected based on recordings of 4/6 user tests that I conducted for extracting people’s natural instincts of interacting when using gestures.

User 1

“OPEN THE CURTAINS”

This gesture was intended to signal the initiation of an interface. This user used this gesture in attempt to “start” when the screen implied to “hover over”. He uses it because he wants to proceed to the next page. For this example, I tried multiple opening interfaces to see if the user would react differently, when the hand-waving gif was not present on the interface yet, the user opted for this gesture to initiate the “start”.

“THE WAVE”

The wave gesture was what the designed interface is trying to convey. The idea is to make the computer vision notice the presence of a person, by the person initiating a wave, to say “hello”. This gesture was used when the hand-waving gif was present.

“THE NUMBER POKE”

The user’s first instinct of navigating a multi-optional menu was to present a number using his fingers, signaling he wanted either the first, second, third, or forth option to be selected.

“THE PALM CURSOR/PAN AND GRAB”

The user proceeded to change his instinct gesture of the number poke when navigating the menu into a pan and grab gesture when noticing that the cursor is present and can be controlled by hovering, grabbing=selecting.

“GRAB AND PAN”

The grab and then pan gesture when the user was navigating a catalogue where he tries to “flip” or “scroll” to the next page.

“THE NUMBER POKE TO GRAB”

The user used the number poking gesture again when wanting to select an item in a page with multiple selections. When selecting, he used the grabbing gesture, which appeared smooth because he simply had to shut his hand into a fist from pointing a “five”.

“THE PALM CURSOR TO GRAB”

The user implemented the panning with palm gesture again to navigate a menu with multiple selectable buttons. He didn’t use the number poke for this one (why the inconsistency?). Again, grabbing to signify selecting an object.


User 2

“VIRTUAL TABLET”

This user utilized their finger to “virtually” scroll and tap the screen when wanting to select an item from multiple options.

“THE WAVE”

The wave gesture was what the designed interface is trying to convey. The idea is to make the computer vision notice the presence of a person, by the person initiating a wave, to say “hello”. This gesture was performed by this user as intended by the interface, where a hand-waving gif as well as text instructions for how to proceed were presented.

“ZOOMING AND MINIMIZING”

When presented a map, the user had the instinct to zoom in to be able to view the information presented clearly. she proceeded open up a fist to signal zooming in, and vice versa for panning back out.

“THE POKE”

When selecting an item, this user opted use her fingers to to “tap” in the air to select an item while her hand/fingers are not too close in proximity to the screen.

“NEXT PAGE PLEASE”

When presented the start screen of the SAS infotainment interface, despite my attempts of putting the hand-waving gif up front this user still opted for a quick swiping hang gesture to move on to the next page.


User 3

“THE PALM CURSOR”

The user, when presented a menu with multiple options that open up modals, used their palm as a virtual cursor to slowly pan between the different options available.

“DON’T PUSH THE RED BUTTON”

When selecting an option, the user uses a “pushing” gesture to signify selection. Imagine having a big control panel, and pressing a big button the size of your palm.

“THE WAVE”

The wave gesture was what the designed interface is trying to convey. It tells you to wave to start scanning. The idea is to make the computer vision notice the presence of a person, by the person initiating a wave, to say “hello”. The wave was the first instinct for the user to perform upon seeing the text “wave to begin scanning on screen”, there was no hand-waving gif in this example, but the user performed the waving gesture unlike some other users that didn’t until they saw the gif.

“PAN AND PRESS”

The user maneuvers their palm acting as a cursor to the press-able button and does the push button gesture again to make his selection.

“THE NUMBER POKE”

When presented with a more complex interface like this with multiple text options, the user presented numbers with his fingers to convey which option he wanted, despite there being no numbers beside the options.

“THE DOUBLE NUMBER POKE”

A follow-up of the number poke. Here you can see that the user was trying to input two options, three and five, by gesturing number three and five with his fingers. I call it the double number poke.


User 4

“THE WIDE SWIPE”

The user explained this gesture as “like turning a page in a book”, the user performed this gesture when the instructions to “wave” at the screen was not present.

“THE GRUMPY OLD MAN”

When then presented with text instructions on the screen to “wave to begin scanning’, the user stated that he would probably still do the “wide swipe” gesture three times, get annoyed, read the instructions on the screen again, then proceed to do the wave.

“THE INTIMATE POKE”

According to the user, when presented with a “button”, he would of course do the poking gesture, with finger close in proximity (as opposed to previous user that did the poke and poked from afar).

“THE WAVE”

When presented with instructions to “wave” at the screen for this example, the user went directly for the wave because the text instructions were more “prominent” on the screen as opposed to how it was in the previous example.

“THE SCROLL-UP”

When asked how this user would scroll up a rather complex text-based interface, the user performed this scroll up gesture. Whole hand is involved not just finger.

“THE SMALL SWIPE”

When presented with multiple options in a menu for locating a store in a mall on an info kiosk interface, he would opt for doing small swiping gestures, kind of like dragging an item from the left-hand menu and bringing it up onto the map.

“THE THREE-FINGERED ZOOM”

When presented a map, the user had the instinct to zoom in to be able to view the information presented clearly. she proceeded open up a fist to signal zooming in, and vice versa for panning back out. Compared it to “like on the iPhone”.

“THE SWIPE UP TO OPEN”

When presented with a graphic menu where you can open to present new windows, this user wanted to swipe up to open. Maybe it’s because he noticed the undertext of each element slide upwards that triggered that response, but we cannot be sure.

Digital versus Physical testing

*By digital I mean testing prototypes on distance via e.g Zoom, Discord, Microsoft Teams, and implementing wizard-of-ozzing when testing “functionality” but behind the screen, physical meaning I’m testing them in person sitting next to them, and moving the cursor based on either how i think they’re controlling it, or asking the user directly where to move the cursor

I’ve found that testing through the screen on distant has been easier because according to the testers the fact that the cursor was moving accordingly felt realistic, more immersive, and because they couldn’t see me physically moving the cursor, the experience felt more convincing. On the other hand, when I was testing people in person the whole interaction felt a little bit forced, and though the testers report “having fun”, it was more confusing to the user what they had to do, while the tester on distance knew immediately what to do after hearing my instructions and seeing the interface.

The beginning of testing is always tedious…

I’m going to start with testing today, with the interfaces I’ve made, but it’s difficult because I’m aiming to test people on distance, virtually if you may. How that would function is that I will be sharing my screen, and paying close attention to their instinctive response, aka their initial response when they’re told that they’re not allowed to talk touch the screen (I will also try to probe to avoid people proposing voice control)… I will of course be following the “test” up with follow-up questions, I don’t have a structured set of questions yet as every interview will vary by quite a lot, will just pose spontaneous questions. When doing face-to-face interviews, it will be easier, I’ll mount a camera (after asking for permission) and record their behavior. I’ll try to control the interface by wizard of ozzing, e.g asking them, where are you moving the cursor to right now? etcetera.