Você está na página 1de 77

Thesis for the Masters degree in Computer Science

Speciale for cand.scient graden i datalogi

OScreen Interaction
Karampis Panagiotis
Supervisor: Sebastian Boring
Department of Computer Science, University of Copenhagen
Universitetsparken 1, DK-2100 Copenhagen East, Denmark
lwh738@diku.alumni.dk

April 2015

ii

Abstract
Modern desktop paradigms are operated through a set of keyboard combinations,
mouse clicks even mouse pad gestures, in which users are tied with and naturally
provide after so many years of usage a fluent interaction. Despite the vast evolution
of the available ways used to interact with Virtual Reality, the fundamental principle
of interaction remains always the same: usage of the concrete, well-known physical
devices (keyboard, mouse) attached to the computer. We present Oscreen Interaction, a system to utilize the spatial space around the screen, which serves as windows
storage area while we interact with the computer screen through a pluggable, gesturerecognizing device. The aim is to comprehend how users react to the existence or lack
of any form of visual feedback and whether grouping windows while positioning affects the performance when no feedback is given. In a user study, we found that the
most efficient and eective way of interaction was when visual feedback was given;
in the case of no visual feedback, we observed that participants achieved the highest
performance by grouping windows or applying some subjective methodology.

iii

iv

Acknowledgements
To family and Lela who endlessly supported through all this eort...

vi

Contents
1 Introduction

1.1

Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

Objectives - Thesis Establishment . . . . . . . . . . . . . . . . . . . .

1.3

Research Questions addressed . . . . . . . . . . . . . . . . . . . . . . .

1.4

Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Related Work

2.1

Using secondary means . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2

Using virtual space management . . . . . . . . . . . . . . . . . . . . .

2.3

Using space around displays . . . . . . . . . . . . . . . . . . . . . . . .

10

2.4

Mid-air Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.5

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

3 Design

13

3.1

Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

3.2

The oscreen area . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

3.3

Gestures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

3.4

Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

3.5

Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

4 Implementation

17

4.1

Application Description . . . . . . . . . . . . . . . . . . . . . . . . . .

17

4.2

Development Languages & Frameworks . . . . . . . . . . . . . . . . .

18
vii

4.3

Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

4.4

Connectivity & Architecture . . . . . . . . . . . . . . . . . . . . . . . .

20

4.5

Basic workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

4.6

Gesture algorithm workflow . . . . . . . . . . . . . . . . . . . . . . . .

22

4.7

Matching Coordinate Systems & Mouse Movement . . . . . . . . . . .

24

4.8

Application Identification . . . . . . . . . . . . . . . . . . . . . . . . .

24

4.9

Implementation the oscreen and inscreen area recognition . . . . . .

25

4.9.1

The orthogonal cognitive area . . . . . . . . . . . . . . . . . . .

25

4.9.2

Implementation of boxes . . . . . . . . . . . . . . . . . . . . . .

26

4.10 Abstract application workflow . . . . . . . . . . . . . . . . . . . . . . .

27

5 Experiment - User study

29

5.1

Experimental Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

5.2

Experiments Procedure . . . . . . . . . . . . . . . . . . . . . . . . . .

30

5.2.1

Welcome - Brief explanation of interaction . . . . . . . . . . . .

30

5.2.2

User Learn Mode . . . . . . . . . . . . . . . . . . . . . . . . . .

31

5.2.3

Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

5.2.4

Demographic Information . . . . . . . . . . . . . . . . . . . . .

33

5.2.5

Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

5.3

6 Results

37

6.1

Completion Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

6.2

Error Rate

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

6.3

Peculiar Observations . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

6.4

Subjective Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

7 Discussion

43

8 Conclusion - Future Work

45

8.1
viii

Conclusion

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

8.2

Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A Code snippets

46
53

A.1 Coordinate translation & mouse movement . . . . . . . . . . . . . . .

53

A.2 Screen point conversion . . . . . . . . . . . . . . . . . . . . . . . . . .

53

A.3 Inscreen area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

A.4 Oscreen area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

A.5 Upon grab, extract windows under cursor information . . . . . . . . .

54

A.6 Shue windows before experiment starts . . . . . . . . . . . . . . . . .

54

A.7 Extract the title . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

A.8 Generate image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

A.9 Move window (Single Feedback) . . . . . . . . . . . . . . . . . . . . . .

56

B Demographic Information Form

57

C Technique Assessment Form

59

ix

List of Figures
1.1

Open windows arrangement before switching to a new task. . . . . . .

1.2

Open windows arrangement after switching to Sublime. . . . . . . . .

1.3

User points or gestures on a window. Window becomes selected. . .

1.4

User drag and drop the window in the o-screen area. Foreground
is occupied by another window. . . . . . . . . . . . . . . . . . . . . . .

Window has been dragged in the oscreen area is now available to be


retrieved by reversing the process. . . . . . . . . . . . . . . . . . . . .

3.1

Oscreen area illustrated with dimensions. . . . . . . . . . . . . . . . .

14

3.2

Grab gesture sequence. Release sequence is the reversed grab. . . . . .

15

3.3

Chrome applications image on box (0,1), current hand position in box


(0,2) with red frame . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

3.4

Chrome applications image on box (0,1) and hand position at (0,1) .

16

4.1

Leap Motions size and upon operation of hand skeletal tracking. . . .

19

4.2

Leap Motion Controller structure. [56] . . . . . . . . . . . . . . . . . .

19

4.3

Leap Motion Interaction box: A reversed pyramid shape formulated


by cameras and LEDS. [2] . . . . . . . . . . . . . . . . . . . . . . . . .

20

4.4

Leap Motion Controller architecture. [2] . . . . . . . . . . . . . . . . .

21

4.5

Oscreen Interaction - Abstract workflow of Basic Modules. . . . . . .

22

4.6

Gesture algorithm workflow: From initial state to grab state. . . . . .

23

4.7

Gesture algorithm workflow: From grab state to release state. . . . . .

23

4.8

Interaction box diagram. [2] . . . . . . . . . . . . . . . . . . . . . . . .

26

4.9

Box class diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

1.5

xi

xii

5.1

Set-up for left handed mouse users. . . . . . . . . . . . . . . . . . . . .

29

5.2

Set-up for right handed mouse users. . . . . . . . . . . . . . . . . . . .

29

5.3

Notification with remaining windows number and title. . . . . . . . . .

32

5.4

Notification showing next window to be fetched from oscreen. . . . .

32

5.5

Windows after shuing. . . . . . . . . . . . . . . . . . . . . . . . . . .

32

5.6

Selected windows for the experiment . . . . . . . . . . . . . . . . . . .

33

6.1

Participant 1 fully matched FF, SF, NF. . . . . . . . . . . . . . . . . .

39

6.2

Participants 1 grouping. . . . . . . . . . . . . . . . . . . . . . . . . . .

40

6.3

Participants 2 potential grouping. . . . . . . . . . . . . . . . . . . . .

41

6.4

Mean Values assessed by participants for each feedback mode. . . . . .

42

List of Tables
5.1

Feedback cycling for the first 3 participants. Afterwards, it repeats itself. 31

5.2

Sample Balanced Latin Square algorithm for windows . . . . . . . . .

31

5.3

Demographic information . . . . . . . . . . . . . . . . . . . . . . . . .

33

6.1

Abbreviations for the feedback modes . . . . . . . . . . . . . . . . . .

37

6.2

Mean time dierences between modes . . . . . . . . . . . . . . . . . .

38

6.3

Mean error dierences between modes . . . . . . . . . . . . . . . . . .

39

xiii

xiv

Chapter 1

Introduction
Window switching is one of the most frequent tasks a user performs and can occur
several hundred times per day. Numerous window operations are performed when we
work and run multiple windows, such as moving, resizing and switching. Management
of such activity, would help enhancing users computer experience and eectiveness.
Window switch can be evolved as a really complicated task, since for example a
developer may need to switch to browser and look for documentation, switch back
to IDE to write code, switch to terminal test and push the code, check for emails
and update completed tasks. Window switch is unavoidable, even on larger screens
users tend to consume all available screen space and create even more windows to
navigate to [23]. Generally, users rely on the Operating Systems window manager to
provide a convenient way to manage open windows according to their needs so that
they are easy to retrieve. Switching windows is divided into two subtasks: find the
desired window and then bring it to the foreground. Widely accepted techniques to
achieve both subtasks are performed either directly with mouse or with keyboards
keys combinations.

1.1

Problem Description

Interaction with objects scattered across the screens using the mouse can be troublesome as mouses movement is limited to the area of desk. In addition, it gets even
more problematic when drag and drop is involved, where one miss-click restarts the
process. Furthermore, mouses cursor is limited to the screen thus any interactions
are restricted within this plane. 3D interaction with screen has been researched and
several input devices have been developed that enable users to manipulate virtual
reality (VR), for example virtual hand and depth cursor techniques. Such techniques
-along with corresponding input devices- have been considered as inadequate, appropriate to use only under specific circumstances. Additionally, the hardware used is
way too expensive such as systems used to track bodies in the cinema industry [53],
[42].
1

Several Window Managers (WM) have been implemented to visualize this process.
Windows 7 presents each window on its own frame even if they are windows of the
same application. Macs mission control tiles opened windows so that they are visible
at once and simultaneously stacks windows of the same application, for all applications
that belong to the same desktop or workspace. Gnome 3 assigns dierent windows on
its own frame but presents windows under the same application stacked. The way to
navigate between open windows varies between Window Managers. The combination
of Alt+Tab (Cmd+Tab) is across all systems for consistency reasons, although cycling
between windows of same application varies from one Window Manager to another.
However, Robertson et al. [47] showed that stacking windows under same application
confuses many users, because the applications windows may not be related to the
same task, where a task is defined as a collection of applications organized around a
particular activity.

1.2

Objectives - Thesis Establishment

We propose a dierent technique for switching between tasks and involve humancomputer interaction (HCI) in the function of windows switching. As there are many
HCI patterns including body posture, hand/finger gestures, speech recognition, eyes
movement and so on, we chose to use hand gestures as they are the most trivial and
easy for users to get familiar with.
Using MacOSXs Mission Control to switch between applications can be cumbersome.
For example, windows are re-arranged every time another window gets focus or a new
one is created. In the following scenario the user is writing a document, therefore
the application Pages is selected, then user wants to switch to Sublime and activates
the Mission Control by pressing F3 and selects Sublime from the upper left corner.
Figure 1.1 shows the windows initial arrangement.

Figure 1.1: Open windows arrangement before switching to a new task.

At that point, user wants to switch back to the Pages and presses the F3. Figure 1.2
shows now a completely dierent screen, where the user has to gaze and find where
2

Pages are located so that they can select it.

Figure 1.2: Open windows arrangement after switching to Sublime.

From that point and after, switching between Pages and Sublime does not trigger
re-arrangement, unless a third task interleaves. Then again, user will have to find the
correct window in a rearranged list.
We experienced also this behavior by using the common paradigm of Cmd+Tab: after
switching between tasks using keyboard, Mission Control has rearranged the windows.
Preliminary experiment conducted with users both familiar and unfamiliar with MacOSX, showed that users were frustrated by the mechanic, were unable to understand
the pattern and stated that they expected pattern of windows order to be clear on
first sight rather than having to spend time into investigating how windows are arranged. All users reacted positively in arranging windows in a linear form such as in
rows or columns.
The contribution of this research is to design a new Human-Computer interaction
utilizing gestures in the oscreen spatial space. This would allow a more eective
task-switching paradigm by placing and retrieving desired windows in the spatial
space around a computer screen. This spatial space is defined by the cone area where
the LEAP Motion operates in. Figures 1.3, 1.4, 1.5 demonstrate this functionality.

Figure 1.3: User points or gestures on a window. Window becomes selected.


3

Figure 1.4: User drag and drop the window in the o-screen area. Foreground is
occupied by another window.

Figure 1.5: Window has been dragged in the oscreen area is now available to be
retrieved by reversing the process.

1.3

Research Questions addressed

Here, we present the scientific research questions addressed by Oscreen Interaction


and in terms of goals, we want to exploit the impact and the efficiency of such 2.5D1
pointing system through dierent visual feedback modes, bringing users one step
closer to freeing themselves from operating solely with mouse. Therefore, the main
questions that this thesis will try to answer are:
Q1:How important is the role of visual feedback in VR systems? To
which extend it aects the performance?
Feedback and HCI techniques are inextricably connected. For instance, we
get feedback when we press Alt+Tab to switch windows through a dialog in
the screen or when we move the mouse, we can see the mouse cursor in the
screen (full feedback). What would be our experience in HCI if there was no
dialog when pressing Alt+Tab (no feedback) or if when we hover over menus, a
change of color (typical behavior) would only appear frequently and occasionally
(partial feedback)? We need to investigate this behavior in our system and
examine how feedback influences performance on the oscreen area.
Q2: Do users interact randomly with an non-visible area or develop
techniques?
1 With

the term 2.5D we mean that although we interact with a typical 2D computer screen, the
actual pointing is happening in a 3D space, thus the term 2.5D

Another factor that would improve vastly the performance and the eectiveness
is grouping according to subjective patterns or following some specific subjective
methodology when positioning windows in the oscreen area. As we explain
in the experiment 5.2.3, we would like to see the existence (or not) of such
formations; which we consider would give us directions for future research.

1.4

Thesis Structure

The structure and the content of the thesis is shortly described below. This work is
consisted of 8 Chapters, including this chapter, which are structured as follow:
Chapter 2: Publications related to several forms of interaction (Secondary
means 2.1, Virtual Space management 2.2, using space around displays 2.3, midair interaction 2.4), are presented. We quote papers that are highly relevant to
our work and others that are in similar field, as this study merges a variety of
considerations. In this chapter we target to help the reader comprehend current
limitations and current approaches in the area as well as introduce the reader
to this field.
Chapter 3: We present the design of our implementation and analyze the considerations taken behind the Oscreen Interaction while backgrounding choices
made with scientific facts. The major design choices (oscreen design, gesture
choice and feedback modes), which are listed and explained, will help the reader
to understand better our work and act as a preliminary but required step before
reading the implementation.
Chapter 4: We present the implementation of the prototype application built
to test the Oscreen Interaction. We provide thorough details on the way it
works, architecture and workflows as well as on the way the prototype was
built. Small code snippets can be found in this section while the functions that
we consider as the most important can be found in the appendix A.
Chapter 5: All the information regarding the User Study we performed, are
included. We present and explain the phases of the experiment along with its
modes for each participant that took part in the study. Statistical demographic
results are also shown in the end of the chapter.
Chapter 6: We present the results of the User Study, separated in corresponding categories. Each category represents a specific metric we tested. We also
provide a form of visualization through graphs and state our observations after
investigating the log files containing data regarding positioning and order of
windows during the experiment.
Chapter 7: First we set our hypotheses on the results. Secondly we discuss
the results based on the findings of Anova analysis performed previously and
on the participants assessment. Finally we comment on whether the overall
concept and the hypotheses assumed hold.
5

Chapter 8: In this chapter, the final conclusions are presented along with
identified and proposed research areas for future work.

Chapter 2

Related Work
The work described in this thesis aims to develop a prototype model for interacting
with computer displays using an external device that bridges humans 3D environment
with the desktop metaphor of the computers screen for performing task switching
operations.
This chapter provides an overview of existing techniques, technologies and implemented frameworks. Although there is no clear distinguish between the several themes
mentioned below, we categorized existing work into four themes: Using secondary
means, space around displays, virtual space management and mid-air interaction.

2.1

Using secondary means

Interacting with a display by using secondary means indicates that the user can
manipulate and interact to what is shown in a display through another interface.
Such interface can be a mobiles, PDAs, touchpads screen or a tabletop by using
gestures with the dominant or both hands.
A secondary display, such as the PDAs display, allows a user to remotely switch
between top-level tasks and scroll in context using the non-dominant hand while
dominant hand operates with mouse.[39] While the system provides such advantages,
it is imperative to indicate that switching tasks through a PDAs screen can be cumbersome and prone to errors of not selecting the desirable task due to small resolution
size. Although current technology enhances PDAs with higher resolution, fast switching between tasks is limited by having to focus on PDAs screen, identify the correct
task from the list, select it and focus back on computers screen.
A numerous researchers have explored the manipulation of objects within the design
and animation computer area. Multi-point touch tablets have existed for long ago[34],
yet the Multi-finger Cursor Techniques[38] -implemented on a modified touchpad- and
Visual Touchpad[36], which projects hands gestures into a computers screen, allow
7

a degree of hands freedom and interaction with screen objects.


Boring et al. implemented a system that allows users to manipulate devices that are
incapable of touch interaction. By capturing video using a smartphone, along with
techniques for targeting tasks such as Auto Zoom and Freeze were able to precisely
interact with an object in distance.[11]
When interacting with secondary means, areas close to keyboard and mouse are most
appropriate for one handed interactions as shown by Bi et al. [10] The Magic Desk
integrates desktop environment with multi-touch computing and provides a set of
interaction techniques allowing computer screen to be projected into the desk. The
DigitalDesk[58] is built as an ordinary desk, but with additional functionality that
emphasizes in physical paper interactions. The desk uses a reversed approach than
commonly used: Instead of representing physical objects as digitalized objects, it uses
the computer power to enhance the functionality of a physical object. With videobased finger tracking and usage of two cameras for document scan and project data
on the desk, the DigitalDesk provides digital properties to physical papers.
Malik et al. [37] and Keijser et al. [31] use multi-hand gestures to enable the user to
control an object and the workspace simultaneously, thus allowing the user to bridge
the distance between objects and oer the user a wider range of gestures by allowing
the two hands to work together. However, this has been accomplished through additional hardware and vision-based systems. Another approach to manipulating 3D
objects is the Sticky tools[19] which achieves all 6DOF without additional hardware,
allowing users to manipulate an object using one, two or three fingers by using shallow
depth[18] technique.
Part of goal and therefore our dierentiation between forehand mentioned systems is
a) to provide such a system that supports hand -not just a limited amount of fingersinteraction, b) without using external cameras which are subject to occlusion, c) avoid
stationary systems or limit our interaction experience by using secondary screens like
mobiles or PDAs.

2.2

Using virtual space management

Virtual space management refers to the concept of enhancing user experience (UX)
by applying algorithms and operations in existing or newly opened windows in the
desktop. Several researches have been conducted on optimizing the way windows
resize, categorize windows according to user need or even provide their own desktop
metaphor.
The pioneer in that area, Rooms[21], is a system that similar tasks are assigned in
a separate virtual space (room) whilst sharing of windows between similar tasks is
supported. The Task Gallery: A 3D window manager[48], a successor of Rooms,
provides a 3D virtual environment for window and task management where a 3D
stage shows the current task while other tasks are placed in the edges of the stage
8

namely in the floor, ceiling and walls. Each task comprises of its related windows.
To allow users to have an overview of open tasks, a navigational system is introduced
where users can go back or go forward and thus see more tasks or focus more on one.
While the system provides advantages and animation techniques, there are also disadvantages with respect to aiding users who desire to complete many dierent tasks
simultaneously in a small time span. If for example a user wants to switch between
two or more tasks, they will have to move backwards until they find the required task,
select it, find the desired window from the loose stack and operate on it. To switch
to previous task the user will have to reverse the whole process. We can see that the
complexity of such navigation increases according to number of tasks the user needs
to perform. Furthermore the time required to switch between tasks is increased by
the one-second animations between moving forward/backward and opening/closing
tasks.
Elastic Windows[29] and New Operations for Display Space Management and Window Management[25] endorse the need for new and advanced windowing operations
for users with many tasks. The Elastic Windows implementation is based on hierarchically managing and displaying windows with root window to be the desktop. The
system supports inheritance where some characteristics of the parent window can be
inherited to its children. The philosophy of the system is that windows under the same
parent cannot overlap but they consume all the available space. Some side-eects of
this philosophy are that when one window resizes, all of its children and windows of
same parent are also resized leading to a whole desktop re-arrangement with only one
resize. Another side-eect directly arisen from the no overlapping rule is that while
the number of child windows increases, the eectiveness of displayed information decreases. Similarly, QuickSpace[24] automatically moves windows so that they will not
be overlapped by a resizing window. All techniques rely on the existence of empty
space, which may not often be available even on multiple monitor systems as shown
by Hutchings et al.[23]
Scwm[8] is a parametrized window manager for the X11 window manager and is
based on defining constraints between windows. These constraints are presented as
relationships between windows and are user defined. While this system provides some
advantages such as operating on two related windows as one, it is susceptible to multitasking requirements where the number of windows increases so does the number and
complexity of relationships that have to been defined and maintained. Yamanaka et
al.[60] approach the virtual space management in a dierent way. Instead of creating
algorithms to adjust window space upon creation new window or resize eect, the
Switchback Cursor exploits the z-index axis of overlapped windows on a Windows 7
operating system. Mouse cursor upon specific movement in conjunction with specific
keyboard press, traverses and selects windows that are below the visible one.
One approach to address that users work on dierent tasks in parallel and switch
back and forth between dierent windows is to analyses user activities and assign
windows to tasks automatically based on if windows overlap or not[59]. Another
approach addresses the users fast switch by analyzing the window content, relocation,

transparency and content combined[54],[35].


The majority of techniques apply algorithms to manipulate currently open windows
when new one is created. We would like to extend the Rooms and the Task Gallery
according to our needs allowing the underlying resizing technique as is: managed by
the operation system. Thus, instead of using implemented semantics such as move
back, move forward, go left room we define our semantics which indicate where a
window is virtually placed.

2.3

Using space around displays

The theme, space around displays, refers to a concept where the physical (empty)
space close to the interaction target, such as a computer or mobile screen, is used in
conjunction with external sensors like depth cameras and hand gesture recognizing
devices to provide an interaction channel the between human and the computer.
The Unadorned Desk[20] which is our inspiration, is an interaction system that utilities the free space in a regular desk enhanced with sensor. It virtually places the
o-screen and secondary workspace onto the desk providing more screen space for
the primary workspace and thus acting as an extra input canvas. The experiments
conducted with the unadorned desk showed that interaction with virtual items on
the desk is possible when items are of reasonable size and number with or without
on-screen feedback.
The usage of a Kinect depth camera mounted on top of the desk limits the mobility
of the system and requires that user is pinned in a specific place in the desk. As
Kinect camera by nature is prone to false detections when sunlight, the camera has
to be placed attentively. Although the system works well when few items are virtually
placed in the extra input canvas, the desks surface has to be clean of physical objects
which is not always a valid case as desks tend to be messy.
Virtual Shelves[35] and Chen et al.[12] combine touch and orientation sensors to
create a virtual spatial space around the user that allows invocation of commands,
menu selection and mobiles applications interaction in an eye-free manner. Wang et
al. [55] demonstrated the benefits of using hand tracking for manipulation of virtual
reality and 3D data in CAD environment using two webcams to track 6DOF for both
hands. However, such tasks are restricted to controlled, small areas targeting specific
frameworks (CAD). Usage of Kinect camera is widespread when comes to augmented
and virtual reality. MirageTable[9], HoloDesk[22], FaceShift[57], KinectFusion[26] is
a set of applications supporting real-time physics-based interactions however as noted
by Spindler et al.[51], tracking with depth cameras, still has limited precision and
reliability. SpaceTop[33] also uses a depth Kinect camera but also a see-through
display. Although it allows 3D spatial input, a 2D touchscreen and keyboard are
also available for input. The unclear visual representation for guidance as noted by
authors is a subject that we need to take seriously. Physical space requires extra
sensors, extra cameras and most importantly to be free of several objects. Our goal
10

is to move the interaction area from solid objects such as a desk into a virtual area
around the screen and thus eliminate usage of additional sensors and cameras which
make the system less portable.

2.4

Mid-air Interaction

There are occasions where interaction with displays has to be done from greater
distance than standing in front of the screen. By mid-air interaction we mean that
the communication channel between user and display is the air/empty space and done
through usage of laser pointers, Wiimote, virtual screens or even worn gloves.
Uses of mid-air interaction varies from point and click to manipulation and selection of
3D data. The latter, was implemented by Gallo et al[14] using a Wiimote controller
to manipulate and select 3D data in a medical environment. Even if system was
not evaluated, the system was able to dierentiate between two states: pointing and
manipulation. In the pointing state, the Wiimote acts as a point and click laser
pointer whilst in the manipulation state, Wiimote interacts with a 3D object with
available actions of rotating, translating and scaling.
In [28], the authors utilize both Kinect depth camera and skeletal tracking[30] to
identify pointing gestures made by a standing person in front of the Kinect camera.
The spatial area in front of the camera is considered as virtual touch screen where
the user can point. To detect the direction of the pointing gesture, they detect and
track the pointing finger using a minimum bounding rectangle and Kalman filtering.
Interaction techniques using laser have the advantage of low cost and compose the
best known perspective-base technique. A laser beam can act as a control for point
in a multi-display system. Simultaneously, laser beams suer from the limitation
that there are no buttons to augment the single point of tracked input rendering
mouse operations impossible. Additionally, laser pointer techniques suer from poor
accuracy and stability [41], and can be very tiring for sustained interactions. Olsen
et al[41] proposed a non-direct input system and explored the use of dwell time to
perform mouse operations. However, the installation cost and complexity of most
systems are prohibiting when increasing scalability.
Kim et al.[32] tried to approach these limitations by researching ways to embed sensor in body, specifically a wrist-worn device, which recognizes body movements thus
reduce the need for external sensors. The wrist-worn device consists of a depth camera combined with an IMU which recognizes preachingly fingers movements through
usage of biomechanics.
Laser interaction doesnt allow generally recognition of gestures and although Jing et
al. [28] implemented a point and click system, that system is stationary. Wiimote
suers from the lack of identifying finger gestures whilst worn sensors and gloves
require that user embeds an external device upon skin, which might be uncomfortable.
We propose a system that is mobile, with no skin attached devices which identifies
11

hand gestures for both hands and in extension provides the required functionality of
augmented buttons if that is required.

2.5

Summary

We have seen that there is a variety of ways; among them air, third interfaces and
cameras in order to interact with a computer screen, using a huge range of methods
and tools such as worn devices, fingers biomechanics, laser pointers, mobiles, game
controllers and so on. Each aforementioned theme has situational advantages and
drawbacks and comes with implementations that provide an alternative user experience and interaction. The interaction with the computer screen comes at a cost of
introducing either rather complex, non-mobile interaction systems or ways of window
management that rely on algorithms and not on users desires.
We thus propose a work that combines features from the virtual space management
and mid-air interaction by keeping the interaction mean (mid-air) as simple as possible. At the same time we provide a trivial window managing metaphor of select and
then show or hide, giving the user the ability to resize and position windows the way
they want.

12

Chapter 3

Design
On this chapter we describe the considerations to follow in an Virtual Reality interaction according to our needs, the oscreen area, gestures, feedback modes and choice
of framework that supports the interaction.

3.1

Considerations

Selection on screen: The initial event which enables the interaction by identifying the collision of a virtual object, any -partially- visible window, with the
mouse cursor upon a grab gesture.
Selection o screen: The initial event which enables the interaction by identifying the the hand coordinates in the oscreen area upon a grab gesture.
Drop in oscreen: The second stage event that upon release gesture in the
o screen area removes focus from the selected window (hide).
Drop in screen: The second stage event that upon release gesture in the main
screen area gives focus to window selected (show).
Select o screen & drop o screen: A two stage event that cognitively
moves an already hidden window to another box in the oscreen area allowing
organisation of windows.

Worth mentioning is that since our system is considered as a 2.5D pointing system
with no need to implement interaction in the Z axis, we employ manipulation of
virtual objects with four degrees of freedom (4DOF), namely up / down, left / right.
13

3.2

The oscreen area

The interaction box of the physical input device provides us enough space to extend
cognitively the screen area on the top side; space that serves for cognitively saving
data for assigned windows. This area has as much width as the screen width and is
scalable1 to the screen proportions.
Smith et al. [50] observed that the average number of open windows a user keeps per
session is 4 on single display while the dual monitor users keep 12 opened windows.
Based on such observations we chose to divide the oscreen area in 8 boxes thus
allowing to save state for 8 windows.

Figure 3.1: Oscreen area illustrated with dimensions.


The fact that human hand stability deteriorates with age, fatigue, caeine and other
factors [3] indicates that the oscreen area should be designed as large as possible
with large enough boxes to neglect users unstable hand.
We thus cognitively divided the oscreen area in two rows, 4 boxes per row, giving
more freedom for hand movement without risking to choose the wrong box. Each
boxs width is calculated by the formula screenWidth/4 and each box is assigned a
number between 0-3, which indicates the position in the X axis. Since we have two
rows we followed a Cartesian system with positions (X,Y). The X axis has domain
values 0-3 whilst the Y axis 0 and 1. Figure 3.1 illustrates this concept in details.
Although the interaction box is wide enough to operate outside of the screen width,
we decided to keep the interactions strictly within the screen width and thus virtually
extend the screen only on the Y axis. The intuition behind this choice is, that
we didnt want to increase the difficulty of cognitively dividing the oscreen area,
especially when no visual feedback was provided in the experiment.
The interaction with the application starts when the user places their left or right
hand inside the interaction box. At that point, the user has complete control of the
mouse cursor movement.

3.3

Gestures

Gestures implemented are classified as concrete. They are evaluated after they have
been completely performed, e.g. Selection on screen is only valid when hand from
1 Up

14

to a limit

having extended fingers (release) is now a punch (lack of fingers).


Research on previews works [43], [44] has shown that gestures applied on physical
input devices are preferred to match gestures that users would normally apply on
physical objects. Such natural gestures allow the manipulation of virtual objects and
are tied with users knowledge and skills of the physical world.
Hand gestures are used to interact on the 3D space rather than pointing. The later is
commonly used in 2D view systems because of its simplicity and stable performance.
Several user-defined gestures such as pointing, pinch, grab, 2-hand stretch have being
classified according to user experience [44]. Out of the two possible candidates, pinch
and grab, that imply natural and realistic interaction, we have selected the later one
because grab is closer to the natural interaction we target and secondly because pinch
is mostly used for other interactions like rotation and shrink / stretch in dierent axis
[27].
For our purpose, and based on the commonly used gestures as presented above, the
grab is a crucial gesture to implement interactions within the Virtual Reality space.
The prototype then defines two gestures: Grab and Release both applicable to the
oscreen and to the in-screen area. These gestures must be performed sequentially to
complete an action due to the fact these gestures imply that hand transits from one
state to another.

Figure 3.2: Grab gesture sequence. Release sequence is the reversed grab.

Furthermore, mouse cursor can only be under one window, thus only one window
can be identified per grab. In extension to this, interaction is performed only by one
hand. Either the dominant or the secondary, according to user preference.
15

3.4

Feedback

We have designed the application to operate in three dierent modes: Full feedback,
single feedback and no feedback.
The full feedback mode, as shown in Figure 3.3 , provides a window with information
about all boxes, which shown when hand is in the oscreen area. The selected box
is visually identified by red border and in case of a window is cognitively saved, a
screen shot taken at the grab event is shown to help the user identify the window
when required.
The single feedback mode (Figure 3.4), provides a window with information about
the specific box which is associated with current hand position in the oscreen area.
Apart from the screen shot shown, user gets extra visual help by having as window
title the applications title e.g. Chrome.
The no feedback mode provides no informational window nor any other information
when the hand is in the oscreen area.

Figure 3.3: Chrome applications


image on box (0,1), current hand
position in box (0,2) with red frame

3.5

Figure 3.4: Chrome applications


image on box (0,1) and hand position
at (0,1)

Frameworks

The nature of this research is such that forces us to use low level programming languages, frameworks and libraries as close as possible to the operating system while
using up-to-date programming paradigms (Object-oriented programming). For such
reason languages that would require a wrapper to access native calls such as Java or
languages mainly targeting web development such as JavaScript have been excluded
from consideration even if the physical input device supports them.

16

Chapter 4

Implementation
Oscreen Interaction is an application that operates on a standard Mac with any
screen size which runs OS X version 10.7 or higher. Oscreen Interaction cannot be
ported to Windows or Linux based systems because of a) OS X native libraries are
used and b) targets as a replacement of Mission Control.
The implementation has been developed in Objective C language using Xcode. Xcode
was developed by Apple for both OS X and iOS, which contains on the fly documentation for all libraries alongside with other utilities that serve to enhance coding
efficiency and experience such as debugger, tester, profiler, analyzation tool etc. Introduced at June 2, Swift is a new programming language for coding Apple software.
Xcode supports also developing of AppleScript which stands for a scripting language
built into OS X since version 7.
Important code snippets that were crucial for the implementation can be found in the
Appendix A.

4.1

Application Description

Oscreen Interaction is an application which has been built to test the capabilities
and the limits of interacting with desktops applications. It aims to provide an area
above the screen plane where user can interact with, in order to organize and switch
between windows. The interaction with the user is performed through hand gestures
received by a motion tracking input device. Although Oscreen Interaction is not a
complete application, it embeds all the required infrastructure needed to support the
new interaction we are proposing. The basic scenarios are the following:
When user grabs a window and releases in the oscreen area we should be able
to identify which window this is, save information in memory and hide it.
When user grabs a window from the oscreen area and releases it in screen, we
should be able to identify which window it was by checking memory data and
17

then show it.


When user releases a window in the oscreen area, if the box is occupied then
overwrite this box with the new windows data and pop out previously stored
window.
When user grabs a window from the oscreen area and releases in oscreen area,
if the box is occupied then swap the saved memory data, otherwise move the
window in the new box.
Despite this may sound trivial at first glance, Apple does not provide one universal API for interacting, manipulating and identifying windows. As result several
workarounds were implemented in order to overcome limitations that where found.
Specifically, between the Accessibility API and the Cocoa the limitation was that the
Accessibility hierarchy is independent and separate from the window hierarchy.

4.2

Development Languages & Frameworks

Objective-C [40]. Primary programming language for developing OS X applications. Objective-C as name states is an Objected-oriented Programming
language (OOP), successor of C++.
Cocoa [13]. High level API that combines three other frameworks: Foundation,
AppKit and CoreData included by header file Cocoa.h that automates many
application features to comply with Apples human interface guidelines.[50]
Quartz [46]. Provides access to core to lower graphic services, composed by
Quartz 2D API and Quartz Extreme windowing environment.
AppleScript [7]. Scripting and narrative language for automation of repetitive
tasks.
Accessibility API [5]. Extra libraries targetting to assist users with disabilities.
Leap SDK ver 2.2.2 [49]. Provides access to the physical input device chosen
for this application (LEAP Motion) through Objective-C calls.
Carbon (Legacy) [6]. Legacy API, acting as bridge between Cocoa and Accessibility API.

4.3

Hardware

For identifying gestures and bridging the real with virtual space, the Leap Motion
Controller is used. The controller is a small peripheral 3 x 1.2 x 0.5 inches no much
heavier than a common USB stick. It utilizes two cameras and three infra-red LEDs
18

serving as light sources to capture motion and gesture information. The system is
capable of tracking movement of hands, fingers or several other objects within an area
of 60cm around the device in real time. It can detect small motions and has accuracy
of 0.01mm [15].
The small cameras that Leap Motion Controller comes with cannot extract as many
information as other systems that come with large cameras. Because embedded algorithms extract only the data required, the computational analysis of images is less
considerable and therefore the latency introduced by the Leap Motion Controller is
very small and negligible. The fact that the controller is small and mostly software
based, makes it suitable for embedding it in other more complicated VR systems.

Figure 4.1: Leap Motions size and upon operation of hand skeletal tracking.

Although a few details are known regarding algorithms and its advanced principles
for the Leap Motion, as is protected by patental restrictions, Guna et al. [16] attempt
to analyze its precision and reliability for static and dynamic tracking. Official documentation [1] states that it recognises hands, fingers, and tools, reporting discrete
positions, gestures, and motion.

Figure 4.2: Leap Motion Controller structure. [56]


19

J. Samuel [27] categorizes the Controller as an optical tracking system based on stereo
vision principle, where the interaction box of the controller is shown in the Figure
4.3 below. The size of the InteractionBox is determined by the Leap Motion field of
view and the users interaction height setting (in the Leap Motion control panel). The
controller software adjusts the size of the box based on the height to keep the bottom
corners within the filed of view. [1] with a maximum height of 25cm.

Figure 4.3: Leap Motion Interaction box: A reversed pyramid shape formulated by
cameras and LEDS. [2]

It is important to mention that controller is accessed through an API that supports


dierent programming languages, amongst Objective-C that is our main programming
language.

4.4

Connectivity & Architecture

The Leap Motion runs over USB port as a system service that receives motion tracking
data from the controller. Using dylib (dynamic libraries) on Mac platform exposes
these data to the Objective-C programming language. Furthermore, the software
supports connections with a WebSocket interface in order to communicate with web
applications.
Figure 4.4, shows the architecture of the Leap Motion Controller, which consists of:
Leap Service, receives and processes data from the controller over the USB
bus and makes data available to a running Leap-enabled application.
Leap control panel, runs separately from service allowing configuration of the
controller, calibration and troubleshooting.
Foreground application, receives motion tracking data directly from the service while application has focus and is in the foreground.
20

Background application, allows reception of motion tracking data even if the


application runs in background, is headless or run as daemon.

Figure 4.4: Leap Motion Controller architecture. [2]

4.5

Basic workflow

The Oscreen Interaction application can be dividing in the following modules:


Start Application - represents the launch o the application.
Window module - in this module the applications window controller is registered and listens to events for showing or hiding in the designed areas. This
controller supports three modes: Full feedback, single feedback, no feedback.
View module - module responsible for visualizing the selected o screen box.
Leap module - this module loads the application logic to handle motion tracking data from the Leap Controller.
Experiment module - This module is responsible for guiding the user when
conducting experiment, although it is not a discrete entity, as is included in the
Leap module.
Logger module - saves various data in log file when experiment is conducted.
The flowchart of those modules can be observed on figure 4.5. We notice that although registration of modules is linear, windows module in cooperation with leap
21

module oer the variation between the three feedback modes, thus allowing user to
get familiarized first with the cognitive oscreen area before actually conducting the
experiment.

Figure 4.5: Oscreen Interaction - Abstract workflow of Basic Modules.

4.6

Gesture algorithm workflow

On this work, we have implemented an algorithm that performs actions of hide / show
of opened applications as well as organizing them in the oscreen area based on hand
coordinates and the two gestures of grab and release. Figures 4.6 and 4.7 demonstrate
how algorithm works in general before getting into details on the next sections. The
algorithm starts upon hand is recognized by the Leap service and executes per frame
based on tracking data given (coordinates and handStrength).
It is vital to refer to an exposed variable by the API named handStrength, part
of the LeapHand class, which indicates how close a hand is to appear as a fist
by measuring how curly fingers are. Based on that, we can say that fingers that
arent curly will reduce the grab strength and therefore reduce the probability of
identifying a grab gesture whilst fingers that are curly will increase the grab strength
and therefore reduce the probability of identifying a release gesture. This variable has
domain values [0..1] and experimentally decided that value >=0.8 indicates a grab
22

and <=0.2 indicates a release, allowing us to identify the gesture in real time and not
only when a gesture suddenly appears.

Figure 4.6: Gesture algorithm workflow: From initial state to grab state.

Figure 4.7: Gesture algorithm workflow: From grab state to release state.
23

4.7

Matching Coordinate Systems & Mouse Movement

Leap Controller provides coordinates of fingers in units of real world (mm), thus
it is vital to translate coordinates in screen pixels, according to screen resolution.
The SDK provides methods to normalize values in the [0..1] range and get screen
coordinates. We need to keep in mind that top left corner in OS X is (0,0) whilst the
(0,0) in the Leap is on bottom left, thus we need to flip the Y coordinate.
L e a p I n t e r a c t i o n B o x * iBox = frame . i nteracti onBox ;
LeapVector * n or ma li z ed Po in t = [ iBox normaliz ePoint : leapPoint clamp : YES
];
int appX = n or ma l iz ed Po i nt . x * screenWidth ;
int appY = n or ma l iz ed Po i nt . y * screenHeight ;
appY = screenHeight - appY ;

Having the screen coordinates, it is then trivial to control the mouse with hand:
CGEventRef move1 = C G E v e n t C r e a t e M o u s e E v e n t ( NULL , kCGEventMouseMoved ,
CGPointMake ( appX , appY ) , k C G M o u s e B u t t o n L e f t ) ;
CGEventPost ( kCGHIDEventTap , move1 ) ;

4.8

Application Identification

The next algorithm shows the steps to identify a window under mouse cursor, after a
grab gesture is performed in the in-screen area:
GrabI sFinish ed AND isInScreen
Get mouse location using Cocoa
Convert mouse location to CGPoint using Carbon API
Get AXUIEle mentRef by CGPoint using Accessibility API
Extract application PID from AXUIEle mentRef
Extract application Title from AXU IElement Ref
Generate image of Application given then Window Title .
Scan windows in screen
If k C G W i n d o w O w n e r N a m e is equal to extracted Title
Get the windowID
Create bitmap of application by windowID
Save data in custom object for using upon release .

Listing 4.1: Abstract algorithm for identifying window.

As we can see, three APIs (Cocoa, Carbon, Accessibility) are cooperating for a single
window identification.
24

4.9

Implementation the oscreen and inscreen area


recognition

In the Design chapter(3.2), we defined abstractly the oscreen area. Now, we show
how this cognitive area is implemented by combining screen coordinates and the Leap
Controllers coordinate system.

4.9.1

The orthogonal cognitive area

In section 4.7 we normalize hand coordinates to fit screen dimensions, thus we limit
X and Y values within the screen. In order to overcome this, we also save the hand
coordinates (X,Y) values to variables before the normalization takes place. Given
that the interaction box of Leap starts at 82.5mm which corresponds to pixel 0 and
ends at 317,5mm which corresponds to screen height (fig. 4.8), we can say that prenormalized values higher than 317,5 refer to pixels cognitively greater than the screen
size. Indeed, using the mathematical formula p = (yk y0 ) yn h y0 where p is the
(cognitive) pixel, yk is the hand position in millimeters, h is the screen height (=900)
and y0 ,yn the bottom and upper height of the interaction box, proves that for hand
position of e.g. 350, we refer to the pixel 1024 which in our occasion is in oscreen
area.
Having explained mathematically the translation from pixels to millimeters, we can
now define the oscreen area code-wise by applying if statements:
if ( appY ==0&& yy >= LOWER_BOUND && yy <= UPPER_BOUND && appX >0 && appX <
screenWidth )

Listing 4.2: Oscren identification.

where appY is the normalized Y hand coordinate, yy the prenormalized Y hand


coordinate calculated as described above, LOWER BOUND a small oset to distinguish clearly from inscreen area and UPPER BOUND is a value that arbitrarily
defines the height of the oscreen area.
When we want to refer to the inscreen area, then it becomes trivial:
if ( appY >=0&& appY <= screenHeight && appX >0 && appX <= screenWidth )

Listing 4.3: Inscreen identification.

We just require that appY is within the screen dimensions.


As the oscreen area extends no wider than the screen width size, appX (normalized
X hand coordinate) is always bounded in the (0..screenWidth] range.
25

Figure 4.8: Interaction box diagram. [2]

4.9.2

Implementation of boxes

Since the number of boxes we defined in the Design chapter(3.2) is a constant(=8),


the width of each box is also constant. As per design, we need to distribute these
screenwidth
boxes in two rows and therefore their size is calculated by numberof
boxes , number which
2
is only dependable on the screen width. To find in which of the box our hand is for
numberof boxes
2
the X axis, we calculate boxx = appX screenwidth
and discard the decimals. For
example, when hand is in position appX=465, for a 1440px wide screen, then we refer
8
2
to box 1 (= 465 1440
). For the Y axis, we map the yy value to 0 when its value is
[LOW ER BOU N D, (U P P ER BOU N D LOW ER BOU N D)/2] and 1 when its
value is ((U P P ER BOU N D LOW ER BOU N D)/2, U P P ER BOU N D].
26

We thus have the unique coordinations of any box, allowing us to build an ObjectiveC class and store it in a NSMutableDictionary using these unique coordinations as
identifying key. The instance of the box class (Figure 4.9) is created upon a release
gesture in order to save specific windows data such as title, PID and screen shot
which were generated when user performed a grab in an open window.

Figure 4.9: Box class diagram.

4.10

Abstract application workflow

Having defined key algorithms and workflows above, we can now define the abstract
workflow of our implementation as shown in 4.4.
Initi alizati on
Load modules ( as described in Chapter 4.5)
End
Main Loop ( per frame ) onFrame ()
While hand found do
Move mouse according to normalized hand position
If hand is in offscreen area AND feedback mode is not none
Show feedback window
End
Measure hand grab strength
When grab strength <0.2 OR >0.8 gesture is finished
Get gesture and position of event
If gesture was grab
Apply algorithm as described in Fig .4.6
End
If gesture was release
Apply algorithm as described in Fig .4.7
Free resources
End
End
End
End

Listing 4.4: Abstract algorithm of the implementation.

27

28

Chapter 5

Experiment - User study


On this chapter we describe the details of the user study we conducted by presenting
the principles of the experiment itself and explain thoroughly all the phases that the
experiment consists of; from the setup until the final phase: the assessment given by
the participants. Finally, we present briefly statistical data of our participants.

5.1

Experimental Set-up

In order to interact with the content and evaluate the interaction, we have set-up a
testing area on which a fixed Mac with 17 retina display placed in a table, easily
accessible by the participants sitting in front of it. In front of the laptop, the Leap
Motion controller is placed in a way that potential cable twist will not lean device
forwards or backwards aecting user experience.
The device is attached either on the left or on the right side of the laptop depending
to participants dominant hand and their preference.

Figure 5.1: Set-up for left handed


mouse users.

Figure 5.2: Set-up for right handed


mouse users.
29

5.2

Experiments Procedure

In this chapter, the procedure of the experiment is discussed and thoroughly analyzed.
We conducted a user study to test and analyze the OScreen Interaction. Each user
participating in this study, should follow the instructions given to them, complete the
experimental part and finally provide us an assessment. The procedure followed for
each participant consist of the following parts.
Welcome - Brief explanation of interaction - User Learn Mode - Experiment - Demographic information - Assessment
These parts will be discussed, interpreted and visualized with appropriate figures.
Duration of each experiment was calculated to last around 25 minutes plus time
for the instructor to explain the concept and the participant to perform the final
assessment. We ended up with a study that requires approximately 40 minutes.
Though, the time required for the whole procedure to be completed was variant, as
was depended on how much time the participant used in the User Learn Mode in
order to feel comfortable with the oscreen area. We also logged data other than
timing and errors, data that might help us understand potential position patterns of
windows placed in the oscreen area.

5.2.1

Welcome - Brief explanation of interaction

In the beginning of the experiment the experimenter introduces himself, welcomes the
participant and explains their rights. Participants are allowed to withdraw from the
experiment at any time they want or if they feel uncomfortable or tired. After that
we thank them for participating and helping us in this study and finally introduce
them to the purpose of this study, as well as stating which are the goals we are trying
to achieve.
At that point, we start explaining the various aspects of the experiment. We explain
first, the ways they can interact with the system and explained the two gestures.
Afterwards the three feedback modes were thoroughly presented in conjunction with
an in-depth explanation of the oscreen area. Before proceeding to the actual experiment, we made sure that the participant understood the concepts of the interaction
by encouraging them to use the User Learn Mode (5.2.2) in all feedback modes
and familiarize themselves before the experiment.
Furthermore, participants were asked to get a close look to the specific applications
that they will operate on, as our prototype application could take screen shots of the
majority of applications but not all. Thus, we were forced to use specific applications,
some of which, participant might not be familiar with.
30

5.2.2

User Learn Mode

For reasons we will explain below (5.2.3), it is vital for the participant to spend a
few minutes in the User Learn Mode, where no timings and errors are logged. In this
part, user takes in turn the Full Feedback, Single Feedback and No Feedback mode
while they get brief explanation of screen visualizations and their components where
this is applicable. When participants were ready, they could initiate the experiment
by pressing the ESC key.
At this point and afterwards, the experimenter interferes no more with the participant
as on screen instructions are given.

5.2.3

Experiment

Principles
The experiment itself consists of cycling between the three feedback modes where on
each feedback, each participant is asked to operate on a random windows sequence.
When the experiment is initiated, the three modes follow the Balanced Latin Square
algorithm [52]:
Participant
1
2
3

First Mode
Full Feedback
No Feedback
Single Feedback

Second Mode
Single Feedback
Full Feedback
No Feedback

Third Mode
No Feedback
Single Feedback
Full Feedback

Table 5.1: Feedback cycling for the first 3 participants. Afterwards, it repeats itself.
The reason for encouraging participants to use time in the User Learn Mode, is that
even numbered participants (2,4,6,8), would have to conduct the experiment starting
with the No Feedback mode, a mode that is considered hard especially if one does not
understand position of boxes in the oscreen area.
The same behavior is applied on the window list. The windows to operate on, are
stored in a 8x8 structure where each row contains window titles following the Balanced
Latin Square algorithm as shown in Table 5.2.
#
1
2
3
4

Wnd
Wnd
Wnd
Wnd

1
2
3
4

Wnd
Wnd
Wnd
Wnd

2
3
4
5

Wnd
Wnd
Wnd
Wnd

8
1
2
3

Window Titles
Wnd 3 Wnd 7
Wnd 4 Wnd 8
Wnd 5 Wnd 1
Wnd 6 Wnd 2

Wnd
Wnd
Wnd
Wnd

4
5
6
7

Wnd
Wnd
Wnd
Wnd

6
7
8
1

Wnd
Wnd
Wnd
Wnd

5
6
7
8

Table 5.2: Sample Balanced Latin Square algorithm for windows


Upon experiment start, by pressing the ESC key, a row is picked randomly and the
participant is asked to move all windows cognitively in the oscreen area at any order
31

they want. Simultaneously, they get a notification of the number of windows that are
still inscreen and their title (fig. 5.3). In order to allow participants to pick windows of
their choice, all windows are shown, randomly resized and then shued. This ensures
us that there are windows overlapping and thus a participant can grab partially shown
windows if they desire. Fig. 5.5 demonstrates this concept. When no windows are left
visible in the desktop, the prototype asks the participant to show a specific window
(fig 5.4) based on the previous randomly picked row of windows order. At any time
the participant can ask the system which window was the requested by pressing the
ESC key.

Figure 5.3: Notification with


remaining windows number and title.

Figure 5.4: Notification showing next


window to be fetched from oscreen.

Figure 5.5: Windows after shuing.

To successfully complete a trial, the participant has to grab the application they were
asked from the oscreen area and then release it in the inscreen area. Two errors can
happen:
Wrong window. Participant grabbed a window other than the one requested,
error that is visualized by notification only after a release.
Nothing. Participant did a grab gesture in a box that does not contain a
window. This happens when one makes a gesture in a box which now is empty
because that window has already become visible in a previous trial. Error
notification is visualized immediately.

Experiment Windows List


The following applications have been chosen for the participants to operate on.
32

(a) Chrome

(b) Firefox

(c) Eclipse

(d) Sublime

(e) Xcode

(f) Pages

(g) Skype

(h) Calendar

Figure 5.6: Selected windows for the experiment


We chose deliberately two programs for Software Development (Xcode, Eclipse), two
for Internet browsing (Firefox, Chrome) and two for text editing (Sublime, Pages)
because their functionality is similar to each other enabling us to look for patterns
regarding grouping applications in sequential boxes in the oscreen area. We also had
to narrow down our options due to limitation of screen shot as previously described.

5.2.4

Demographic Information

Demographic Information form is used to collect non-crucial personal information


about the participant. Information such as name, surname and address are not asked
to avoid any confidentiality issues. The information that need to be filled are shown
in table 5.3.
Age
Gender
Job Title

Based on year of birth


Male or female
Participants work area

Table 5.3: Demographic information


We then ask the participants to state their experience in MacOSX usage and especially if they are using the Mission Control to navigate between windows. Finally
we ask them to state how many dierent tasks do they usually perform and rate
their experience with computers. Rating is based on a level 5 Likert scale; Highly
Inexperienced (1) to Highly Experienced (5).
This form can give us clues about if the several factors mentioned below, introduce a
potentially big variation in the error and time variables during the experiment.
In particular:
1. Familiarity with MacOSX
2. Usage of Mission Control
33

3. Number of tasks
4. Computer experience
The Demographic form as subsection of the Assessment form can be found in Appendix B.

5.2.5

Assessment

Having completed all trials in all feedback modes, participants instructed to fill a
questionnaire in a five point Likert scale where they rate and optionally comment on
each of the feedback mode the encountered. The questionnaire (Appendix C) was
divided in three parts, one for each feedback mode. In particular the participants had
to provide an assessment for the following type of feedbacks:
1) Full feedback, 2) Single feedback, 3) No feedback
The questionnaire assessment has the same structure and is independent of the mode
of the feedback. Each assessment consists of 10 questions, which are explained below
along with their importance.
1. Grabbing windows during the experiment was: The participant had to
rate how easy could grab a window for the specific feedback mode. This question
will help us identify if grab window gesture aects or not the question #5.
2. Releasing windows during the experiment was: The participant had to
rate how easy could release a window for the specific feedback mode. This
question will help us identify if release window gesture aects or not the question
#6.
3. Mouse movement smoothness during experiment was: The participant
had to rate the smoothness during the experiment. Smoothness is a subjective
factor and with this question we can assess whether smoothness of mouse while
moving hand inscreen interferes with what window user wants to select.
4. Dividing the o screen area cognitively was: The participant had to rate
how difficult it was to imaginary think of the oscreen area and split it into
boxes. The importance of this question is, that we can assess if splitting the
oscreen in two rows is a feasible solution.
5. Grabbing (oscreen) and releasing (inscreen) the correct window during the experiment was: The participant had to rate how difficult it was to
perform an operation from oscreen to in screen.
6. It was easy to release (oscreen) in the area I wanted: The participant
had to rate how difficult it was to target the box they wanted in the oscreen
area.
Questions 5 and 6 are of high importance since we get feedback about the nature
of the interaction.

34

7. Arm fatigue: Level of fatigue in the arm.


8. Arm fatigue: Level of fatigue in the wrist.
Questions 7 and 8 help us understand if we should find alternative ways of
interaction for Future Research (Section 8.2)
9. General comfort: We ask the participant what their general impression was
regarding comfort. Of course there is correlation between this factor and the
wrist and arm fatigue factors.
10. Overall, the interaction was: We measure the general impression of the
feedback mode. It is measured in usability scale.
Finally in the bottom of each page, there is area designed to allow participants comment the interaction, provide advantages and disadvantages they might have faced as
well as share their thoughts of improvements, if any. This was not a required section,
however we extracted some very welcomed feedback from them.

5.3

Participants

We found 9 participants willing to participate in our study ranging from age 23 to 45


years old. We extracted the following information from their demographic form.
As we can see from Figure 5.7a participants are mostly ranged between 26 and 30
years old. We also tried to recruit people from older and younger ages, that would
hopefully behave interestingly dierent on the experiment. We managed to recruit 2
participants on the 41-45 range and other 1 below 25 years old.

(a) Distribution of
participants age.

(b) Distribution of
participants gender.

Participants were mostly male (56%) as it can be observed on Figure 5.7b. To be


precise, we had 5 participants that were male and 4 female (44%). Four out of nine
are working as managers for a department and three were students/developers. Also
all of them showed that they had experience with Mac before, even if they dont own
one. Figures 5.8a, 5.8b illustrate that all of our users have up to one degree experience
with computers and 56% use the Mission Control to switch between tasks by invoking
it by one way or another (mousepad gesture, press F3 button).
Finally, in Figure 5.8c we observe that the majority of participants interact with 3-5
tasks in their daily routine whilst only one performs more than 5. Though, this only
35

(a) Distribution of
participants experience.

(b) Distribution of use of


Mission Control.

(c) Distribution of
simultaneous tasks.

implies interaction with these tasks, as it is also possible windows that compose other
tasks to be opened but user not using them (inactive).

36

Chapter 6

Results
This chapter presents the results we collected after performing the user study on the
Oscreen Interaction. We decided to focus on 2 basic categories, which include the
most significant information we collected and analyzed. We present the results for
Completion Time and Error Rate. We present and illustrate the mean values for all
the parts of the assessment as discussed in Section 5.2.5. We also refer to the feedback
modes of the trials with the abbreviations presented in Table 6.1.
Feedback mode
Full
Single
None

Abbreviation
FF
SF
NF

Table 6.1: Abbreviations for the feedback modes

6.1

Completion Time

Completion time is a parameter that counts the time a participant spent from the moment they were asked to show a window until they actually show that window. That
means, the timer is triggered upon system notification which informs the participant
for the requested window title and stops when this window is shown in screen. When
the correct window is shown, new system notification is triggered and participant
proceeds with the next timed interaction.
We applied the ANalysis Of VAriance (ANOVA) [4] statistical model, using the aggregated values of time. We used ANOVA to determine if the mean values for completion
time per feedback mode are statistically dierent. We could export basic information
about the means by comparing them, but we want to know how these dierences in
the mean values aect our results and if these dierences are significant.
Using repeated measures ANOVA we found significant main eect in feedback modes
37

(F1.064,8.512 =9.905, p<0.012, with Greenhouse-Geisser correction). Post-hoc comparisons using Bonferroni corrected p-values found significant dierences between FF
and NF(p<0.034 ), SF and NF (p<0.047 ) but no significant dierences between FF
and SF with confidence level of 95%.
We found that Mauchlys Test of Sphericity is violated for all feedback modes. That
means that the type of the mode, has a significant eect on completion time. We
found that Grand Mean is 12.903 seconds (STD=3.248s). Full feedback (M=2.750s,
STD=0.426 ) was really fast, followed by Single Feedback (M=5.111s, STD=2.060s)
and slowest the No Feedback (M=30.847s, STD=8.733s).
Table 6.2 presents the mean time dierences between all possible combination of the
type of feedback mode. As we can see, and as already stated, Mean dierences are
really significant between Full/Single Feedback and No Feedback.
Feedback mode (i)
FF
FF
SF

Feedback mode (j)


SF
NF
NF

Mean Dierence (i-j)


-2.361
-28.097
-25.736

Sig.
0.644
0.034
0.047

Table 6.2: Mean time dierences between modes

6.2

Error Rate

Error rate is a parameter that counts the number of errors a participant did from the
moment they were asked to show a window until they actually show that window.
That means, an error is registered when user is required to show window X but
instead grabs and releases window Y or grabs Nothing. As described before, Nothing
is an error when participant performs a grab gesture within a box that previously
contained a window, but now is empty. User gets information about the error by
system notification, otherwise they proceed with the next interaction.
We applied again the ANOVA statistical model, using the aggregated values of error
rates to determine if the mean values for errors per feedback mode are statistically
dierent.
Using repeated measures ANOVA we found significant main eect in feedback modes
(F2,7 = 79.620, p<0.001 ). Post-hoc comparisons using Bonferroni corrected p-values
found significant dierences between FF and NF (p<0.001 ), SF and NF (p<0.001 )
but no significant dierences between FF and SF with confidence level of 95%.
We found that Mauchlys Test of Sphericity is not violated for the feedback modes and
Grand Mean is 0.312 errors (STD=0.035 ). Full feedback (M=0.090, STD=0.033 )
was least erroneous interaction, followed by Single Feedback (M=0.145, STD=0.062 )
and most errors reported by the No Feedback (M=0.701, STD=0.062 ).
Table 6.3 presents the mean error dierences between all possible combination of the
38

type of feedback mode. As we can see, and as already stated, Mean dierences are
really significant between Full/Single Feedback and No Feedback.
Feedback mode (i)
FF
FF
SF

Feedback mode (j)


SF
NF
NF

Mean Dierence (i-j)


-0.056
-0.611
-0.556

Sig.
1
<0.001
0.047

Table 6.3: Mean error dierences between modes

6.3

Peculiar Observations

At the process of applying ANOVA, we observed two cases worth mentioning, cases
we were looking specifically for. Two of our participants outperformed the rest participants in terms of speed in the NF mode. The 7 participants have average time of
completion of 38s whilst the two special cases report of 3.63s and 6.88s respectively
whilst keeping still high error rates of 42.85% and 40%. These two events intrigued
us to find the reason behind this observation and after an in depth examination of
the log file, we discovered a) subjective grouping when positioning windows in the
oscreen area and b) pattern in positioning.

Figure 6.1: Participant 1 fully matched FF, SF, NF.

Figure 6.1 shows that participant one managed to position windows in all feedback
modes in the same box. This is achieved by their own will and is not some random
39

behavior as this is strengthened by log file entries (Listing 6.1). The final result of
grouped applications is shown in Figure 6.2.

Figure 6.2: Participants 1 grouping.

To give meaningful meaning in the log entries presented below, we will try to explain
them by commenting:
1: C : Chrome :(2 ,0) //Put Chrome in box (2,0)
1: C : Skype :(0 ,0) //Put Skype in box (0,0)
1: C : Sublime Text 2:(0 ,1) //Put Sublime Text 2 in box (0,1)
1: C : eclipse :(1 ,1) //Put eclipse in box (1,1)
1: C : Pages :(2 ,1) //Put Pages in box (2,1)
1: C : Firefox :(3 ,0) //Put Firefox in box (3,0)
1: C : Calendar :(2 ,1) //Put Calendar in box (2,1), Pages pops out
1: C : Xcode :(0 ,0) //Put Xcode in box (0,0), Skype pops out
1: C : Pages :(0 ,0) //Put Pages in box (0,0), Xcode pops out
1: C : Xcode :(1 ,0) //Put Xcode in box (1,0)
1: C : Skype :(3 ,1) //Put Skype in box (3,1)

Listing 6.1: Log entries for Participant 1, NF mode.

Participant 2 on the other hand, followed a dierent methodology. They filled the
bottom row first and then the upper row. We were unable to identify whether the
positioning in such pattern as shown in Figure 6.3 was accidentally or predicted as
there are no pop outs (Listing 6.2).
2: C : Pages :(0 ,0)
2: C : Chrome :(1 ,0)
2: C : Skype :(2 ,0)
2: C : eclipse :(3 ,0)
2: C : Sublime Text 2:(0 ,1)
2: C : Firefox :(1 ,1)
2: C : Calendar :(2 ,1)
2: C : Xcode :(3 ,1)

Listing 6.2: Log entries for Participant 2, NF mode.

40

Figure 6.3: Participants 2 potential grouping.

6.4

Subjective Preferences

In Figure 6.4, the mean values for all the parts of the assessment presented in section
5.2.5 are illustrated and presented. On each diagram the vertical axis represents the
mean values of the corresponding category, which are ranged from 1 to 5 (5 point
Likert scale) whilst X axis represents the mode. Generally, higher values mean that
participants positively rated the specified category.
FF mode scored repeatedly well across all categories, better than any other mode.
Precisely, Full Feedback (Blue) was ranked higher concerning the Overall Interaction
Experience (3.66 / 5) Figure 6.4h and the General Comfort (3.11 / 5) Figure 6.4g.
However, most importantly, NF (Orange) performed badly in a large degree compared to the rest of modes through all categories except wrist fatigue, Figure 6.4d
-category to which all modes performed mediocre with ST Dmean = 0.231. Arm fatigue (Figure 6.4c) especially in NF, indicates a possible future research topic with
target to diminish the eect. Finally, grabbing from oscreen/inscreen and releasing inscreen/oscreen respectively, as well as smoothness, which operates uniformly
(mean = 3.92, ST Dmean = 0.16) ranged high in all modes (Figures 6.4e, 6.4f, 6.4a),
which proves that our prototype application is well structured and corresponds well
to hand movement and gesture identification.

(a) Mean Smoothness

(b) Mean Oscreen Area Cognitively


Division

41

(c) Mean Arm Fatigue

(d) Mean Wrist Fatigue

(e) Mean Grab Oscreen, Release


Inscreen easiness

(f) Mean Grab Inscreen, Release


Oscreen easiness

(g) Mean General Comfort

(h) Mean Overall Interaction Experience

Figure 6.4: Mean Values assessed by participants for each feedback mode.

42

Chapter 7

Discussion
We conducted a study and presented the results, now we need to discuss and interpret
these results. What glues together the experiment (Chapter 5) and the results (Chapter 6) are the hypotheses made, thus we will commence the discussion by stating the
hypotheses and discuss if they hold.

(H1) Visual feedbacks (Full and Single) outperform significantly No Feedback


in terms of task completion time.
In section 6.1 we stated that the No Feedback mode is the slowest mode and we
showed a dierence of ratio of 15 times and 6 times slower than Full and Single
modes respectively. At section 6.3 we presented two cases that achieved really
fast completion time in the No Feedback mode. Although their performance
was remarkable, their times in No Feedback mode outperformed only ones
participant completion time in Single Feedback. This participant performed so
badly in all modes (e.g. 10 times slower in Single Feedback than rest) and
therefore is considered an isolated event. We believe that albeit H1 still holds,
future research is required to strengthen our belief.
(H2) Error rates in No Feedback are significantly higher than any visual feedback.
In Chapter 6, Section 6.2 we noted that in No Feedback mode error rates were
huge compared to the feedback modes that give visual information. This was
expected, as in the No Feedback mode users have no indication where in oscreen
area their hand is and whether the requested window is under this box or the
box is empty. Applying patterns while positioning phase (grouping tasks &
positioning methodology) although aected time completion, error rate is still
high. Even if the error rate is only composed by the Nothing error (5.2.3), still
this is a form of error and we therefore conclude that H2 holds.
(H3) Task completion time in No Feedback mode can be improved upon using
grouping or special spatial positioning.
43

We analyzed results in the Section 6.3 where we presented two cases varying
significantly from others. Indeed our analysis of data validates H3 as two participants managed to improve completion time (best=3.73s) by a factor of 5 from
the fastest participant who achieved 15.13s.
Taking into consideration the above discussed hypotheses, we perceive that all of them
hold. It this though imperative to discuss the influence of H3 in the No Feedback
mode before we can conclude:
Since our window list was chosen such as to allow grouping of windows and strengthened by our findings, we could assume directly from H3 that if there is correlation
between tasks, systems that provide No Feedback might behave no significantly worse
than systems with visual feedback. This assumption though is not taking into consideration error rates and a longer lasting usage of a No Feedback system, which
due to high arm fatigue (Fig 6.4c) would increase significantly the error rate and user
frustration resulting in a no usable system with slow completion time, prone to errors.
In the additional assessment users provided us with (Appendix C), we asked them to
put the 3 dierent modes in ascending order depending on which was the easiest to
use. Users dominantly rejected the No Feedback mode as the most tiring and least
eective to use which definitely depicts user preferences. Observing the results we
got from the error and completion time parameters, we realize that the No Feedback
is outperformed by all other modes regarding all measured parameters. Results indicate no significant dierence between the Full feedback and the Single feedback but
nonetheless Full feedback is slightly more efficient.
Our findings contradict findings of Gustafson et al. [17] where they developed and
studied a viable system without any visual feedback. This contradiction is interpreted
by the fact that our interaction is a mixed interaction between a computer screen and
spatial space whilst their system interacts only in the spatial space. Similarly, Po et
al. [45] state and i quote
Our findings suggest that pointing without visual feedback is a potentially
viable technique for spatial interaction with large-screen display environments that should be seriously considered.
We would like here to dierentiate our study from studies for large-screen displays as
our mechanics dont apply in such displays, research area that we will address in the
section below.
After stating the above observations, and combining H1, H2, H3 we conclude that
a system for interacting between desktops windows and the oscreen space can a)
be seamlessly achieved as long as there is a form of visual feedback to the user and
b) that the existence of some (subjective) form of grouping windows or positioning
methodology in the oscreean area decreases the interaction time but not the error
rate when no visual feedback is provided.

44

Chapter 8

Conclusion - Future Work


This paper has presented an alternative 2.5D interaction technique for task switching
operations on a MacOSX operating system using an external, portable device for
gesture recognition. We will present in this chapter our conclusions based on our
findings as well as provide research areas that are consequently derived or areas that
we didnt research but we believe they should be.

8.1

Conclusion

In this study we proposed Oscren Interaction, an alternative interaction technique


for manipulating (hide and show) open windows, which provides an eective task
switching paradigm by placing and retrieving tasks in the spatial space around a
computer screen. This spatial space is defined by a cone area the Leap Motion Controller -a depth sensing device- operates and allows us to obtain information of hands
and fingers. The main objectives were to investigate the role of visual feedback, how
its absence or presence aects the interaction, and examine if grouping windows improves the interaction when there is absence of visual feedback. Current approaches
and implementations were presented at the very start of this paper along with the
initial problem that intrigued us to build this new technique for switching between
windows on a MacOSX environment. Relevant work and researches that have been
already conducted on the same or related fields were presented afterwards. We separated them in themes, although there is no clear distinguish, and explained the
dierentiation of our approach.
We then presented the design of our implementation, along with important concepts
and thoroughly explained our choices by grounding them from recommendations of
previous works. To test and study the oscreen interaction technique, we developed a prototype application. The implementation, architecture, description of the
development frameworks, workflows and hardware specifications for the prototype
application we developed were presented in later chapter. We continued by analyzing
45

the procedure and the context of the user study. We had 9 participants taking part
in this study. Each participant had to go through the verbal instructions, fill in the
Demographic information form, familiarize themselves with the interaction through
the User Learn Mode (5.2.2), accomplish the three feedback modes (3.4) for the experiment consisting of 8 open windows and finally fill in an assessment form for each
of the 3 dierent type of modes. We then presented statistical information regarding
the demographic information of the participants that took place in our study.
We then presented the computed results for the two basic parameters: completion time
and error rate. We stated our observations from the log file regarding the positioning
in the oscreen area followed their significance. The results and arguments on the
validity of the hypotheses were also discussed.
Finally, we provided evidence and supported that all the hypotheses hold and concluded that a) a system for interacting between desktop windows and the oscreen
space is feasible given that user receivers some form of feedback and b) no feedback
is a feasible solution for small term usage when windows are placed oscreen after
planing but it is not optimal.

8.2

Future Work

Based on the results obtained, the future work include several improvements and
additional features to the current prototype to manage a seamless interaction. One
way to achieve improvement, would be to minimize as much as possible the fatigue
and increase the general comfort. Given that our prototype introduces no significant
wrist fatigue, we believe that by translating the mouse movement to wrist movement
would be a promising topic for researching the interactions behavior utilizing the
wrist and not the whole arm.
Although the No Feedback mode gives no visual feedback on the user, user gets
indirectly some form of feedback from the mouse cursor. This indirect feedback can
be researched to identify to which extend we can get useful information about X
position in oscreen. We know that when hand is in oscreen area, mouse cursor
is on the top edge of the screen. If there was some mechanism that stickied mouse
cursor to the X center of the underlined imaginary box, then user would know exactly
in which box their hand is pointing at.
Since the Leap Controller is connected in the USB, this provides a maximum distance
of 30 meters with repeaters or even further through USB over Ethernet technology.
Therefore, another research area would be to investigate further the interaction with
large displays from greater distance. Finally we would like to introduce more oscreen
areas, not only above screen, but also to the left and right which will give more spatial
space for the interaction. This would allow us to support more windows in order to
investigate further the influence of grouping windows when no visual feedback is
given.

46

Bibliography
[1] Leap Motion Controller (accessed on 6 February 2015).
leapmotion.com.

https://www.

[2] Leap Motion Controller Developer (accessed on 6 February 2015). https://


developer.leapmotion.com/documentation/objc/index.html.
[3] Wei Tech Ang, Cameron N Riviere, and Pradeep K Khosla. Design and implementation of active error canceling in hand-held microsurgical instrument. In
Intelligent Robots and Systems, 2001. Proceedings. 2001 IEEE/RSJ International
Conference on, volume 2, pages 11061111. IEEE, 2001.
[4] ANalysis Of Variance ANOVA. http://en.wikipedia.org/wiki/Analysis_
of_variance.
[5] Accessibility
API.
https://developer.apple.com/library/mac/
documentation/Cocoa/Conceptual/Accessibility/cocoaAXIntro/
cocoaAXintro.html.
[6] Carbon API. http://en.wikipedia.org/wiki/Carbon_%28API%29.
[7] AppleScript. https://developer.apple.com/library/mac/documentation/
AppleScript/Conceptual/AppleScriptLangGuide/introduction/ASLR_
intro.html.
[8] Greg J. Badros, Jerey Nichols, and Alan Borning. Scwm: An extensible
constraint-enabled window manager. In Clem Cole, editor, USENIX Annual
Technical Conference, FREENIX Track, pages 225234. USENIX, 2001.
[9] Hrvoje Benko, Ricardo Jota, and Andrew Wilson. Miragetable: freehand interaction on a projected augmented reality tabletop. In Proceedings of the SIGCHI
conference on human factors in computing systems, pages 199208. ACM, 2012.
[10] Xiaojun Bi, Tovi Grossman, Justin Matejka, and George Fitzmaurice. Magic
desk: bringing multi-touch surfaces into desktop work. In Proceedings of the
SIGCHI conference on Human factors in computing systems, pages 25112520.
ACM, 2011.
[11] Sebastian Boring, Dominikus Baur, Andreas Butz, Sean Gustafson, and Patrick
Baudisch. Touch projector: mobile interaction through video. In Proceedings of
47

the SIGCHI Conference on Human Factors in Computing Systems, pages 2287


2296. ACM, 2010.
[12] XiangAnthony Chen, Nicolai Marquardt, Anthony Tang, Sebastian Boring,
and Saul Greenberg. Extending a mobile devices interaction space through
body-centric interaction. In Proceedings of the 14th international conference on
Human-computer interaction with mobile devices and services, pages 151160.
ACM, 2012.
[13] Cocoa Framework.
cocoa.html.

https://developer.apple.com/technologies/mac/

[14] Luigi Gallo, Giuseppe De Pietro, and Ivana Marra. 3d interaction with volumetric
medical data: experiencing the wiimote. In Proceedings of the 1st international
conference on Ambient media and systems, page 14. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2008.
[15] Lee Garber. Gestural technology: Moving interfaces in a new direction [technology news]. Computer, 46(10):2225, 2013.
[16] Joze Guna, Grega Jakus, Matevz Pogacnik, Saso Tomazic, and Jaka Sodnik.
An analysis of the precision and reliability of the leap motion sensor and its
suitability for static and dynamic tracking. Sensors, 14(2):37023720, 2014.
[17] Sean Gustafson, Daniel Bierwirth, and Patrick Baudisch. Imaginary interfaces:
Spatial interaction with empty hands and without visual feedback. In Proceedings
of the 23Nd Annual ACM Symposium on User Interface Software and Technology,
UIST 10, pages 312, New York, NY, USA, 2010. ACM.
[18] Mark Hancock, Sheelagh Carpendale, and Andy Cockburn. Shallow-depth 3d
interaction: design and evaluation of one-, two-and three-touch techniques. In
Proceedings of the SIGCHI conference on Human factors in computing systems,
pages 11471156. ACM, 2007.
[19] Mark Hancock, Thomas Ten Cate, and Sheelagh Carpendale. Sticky tools: full
6dof force-based interaction for multi-touch tables. In Proceedings of the ACM
International Conference on Interactive Tabletops and Surfaces, pages 133140.
ACM, 2009.
[20] Doris Hausen, Sebastian Boring, and Saul Greenberg. The unadorned desk:
Exploiting the physical space around a display as an input canvas. In HumanComputer InteractionINTERACT 2013, pages 140158. Springer, 2013.
[21] D Austin Henderson Jr and Stuart Card. Rooms: the use of multiple virtual
workspaces to reduce space contention in a window-based graphical user interface.
ACM Transactions on Graphics (TOG), 5(3):211243, 1986.
[22] Otmar Hilliges, David Kim, Shahram Izadi, Malte Weiss, and Andrew Wilson.
Holodesk: direct 3d interactions with a situated see-through display. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages
24212430. ACM, 2012.
48

[23] Dugald Ralph Hutchings, Greg Smith, Brian Meyers, Mary Czerwinski, and
George Robertson. Display space usage and window management operation comparisons between single monitor and multiple monitor users. In Proceedings of
the working conference on Advanced visual interfaces, pages 3239. ACM, 2004.
[24] Dugald Ralph Hutchings and John Stasko. Quickspace: New operations for the
desktop metaphor. In CHI02 extended abstracts on Human factors in computing
systems, pages 802803. ACM, 2002.
[25] Dugald Ralph Hutchings and John T Stasko. New operations for display space
management and window management. 2002.
[26] Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew
Davison, et al. Kinectfusion: real-time 3d reconstruction and interaction using
a moving depth camera. In Proceedings of the 24th annual ACM symposium on
User interface software and technology, pages 559568. ACM, 2011.
[27] Samuel Jimenez. Physical interaction in augmented environments.
[28] Pan Jing and Guan Yepeng. Human-computer interaction using pointing gesture based on an adaptive virtual touch screen. International Journal of Signal
Processing, Image Processing, 6(4):8192, 2013.
[29] Eser Kandogan and Ben Shneiderman. Elastic windows: Improved spatial layout and rapid multiple window operations. In Proceedings of the Workshop on
Advanced Visual Interfaces, AVI 96, pages 2938, New York, NY, USA, 1996.
ACM.
[30] Abhishek Kar. Skeletal tracking using microsoft kinect. Methodology, 1:111,
2010.
[31] Jeroen Keijser, Sheelagh Carpendale, Mark Hancock, and Tobias Isenberg. Exploring 3d interaction in alternate control-display space mappings. In 3D User
Interfaces, 2007. 3DUI07. IEEE Symposium on. IEEE, 2007.
[32] David Kim, Otmar Hilliges, Shahram Izadi, Alex D Butler, Jiawen Chen, Iason
Oikonomidis, and Patrick Olivier. Digits: freehand 3d interactions anywhere
using a wrist-worn gloveless sensor. In Proceedings of the 25th annual ACM symposium on User interface software and technology, pages 167176. ACM, 2012.
[33] Jinha Lee, Alex Olwal, Hiroshi Ishii, and Cati Boulanger. Spacetop: integrating
2d and spatial 3d interactions in a see-through desktop environment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages
189192. ACM, 2013.
[34] SK Lee, William Buxton, and KC Smith. A multi-touch three dimensional touchsensitive tablet. In ACM SIGCHI Bulletin, volume 16, pages 2125. ACM, 1985.
[35] Frank Chun Yat Li, David Dearman, and Khai N Truong. Virtual shelves: interactions with orientation aware devices. In Proceedings of the 22nd annual ACM
49

symposium on User interface software and technology, pages 125128. ACM,


2009.
[36] Shahzad Malik and Joe Laszlo. Visual touchpad: a two-handed gestural input device. In Proceedings of the 6th international conference on Multimodal interfaces,
pages 289296. ACM, 2004.
[37] Shahzad Malik, Abhishek Ranjan, and Ravin Balakrishnan. Interacting with
large displays from a distance with vision-tracked multi-finger gestural input. In
Proceedings of the 18th annual ACM symposium on User interface software and
technology, pages 4352. ACM, 2005.
[38] Tomer Moscovich and John F Hughes. Multi-finger cursor techniques. In Proceedings of Graphics Interface 2006, pages 17. Canadian Information Processing
Society, 2006.
[39] Brad A. Myers, Robert C. Miller, Benjamin Bostwick, and Carl Evankovich.
Extending the windows desktop interface with connected handheld computers.
In Proceedings of the 4th Conference on USENIX Windows Systems Symposium
- Volume 4, WSS00, pages 88, Berkeley, CA, USA, 2000. USENIX Association.
[40] Objective-C. http://cocoadevcentral.com/d/learn_objectivec.
[41] Dan R Olsen Jr and Travis Nielsen. Laser pointer interaction. In Proceedings of
the SIGCHI conference on Human factors in computing systems, pages 1722.
ACM, 2001.
[42] OptiTrack.
https://www.naturalpoint.com/optitrack/systems/
#motive-tracker/flex-3/16.
[43] Thammathip Piumsomboon, Adrian Clark, and Mark Billinghurst. Physicallybased interaction for tabletop augmented reality using a depth-sensing camera
for environment mapping. 2011.
[44] Thammathip Piumsomboon, Adrian Clark, Mark Billinghurst, and Andy Cockburn. User-defined gestures for augmented reality. In Human-Computer
InteractionINTERACT 2013, pages 282299. Springer, 2013.
[45] Barry A Po, Brian D Fisher, and Kellogg S Booth. Pointing and visual feedback
for spatial interaction in large-screen display environments. In Smart Graphics,
pages 2238. Springer, 2003.
[46] Quartz.
https://developer.apple.com/technologies/mac/
graphics-and-animation.html.
[47] George Robertson, Eric Horvitz, Mary Czerwinski, Patrick Baudisch,
Dugald Ralph Hutchings, Brian Meyers, Daniel Robbins, and Greg Smith. Scalable fabric: flexible task management. In Proceedings of the working conference
on Advanced visual interfaces, pages 8589. ACM, 2004.
50

[48] George Robertson, Maarten van Dantzich, Daniel Robbins, Mary Czerwinski,
Ken Hinckley, Kirsten Risden, David Thiel, and Vadim Gorokhovsky. The task
gallery: A 3d window manager. In Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems, CHI 00, pages 494501, New York, NY,
USA, 2000. ACM.
[49] LEAP SDK. https://developer.leapmotion.com/documentation/csharp/
devguide/Leap_Overview.html.
[50] Greg Smith, Patrick Baudisch, George Robertson, Mary Czerwinski, Brian Meyers, Daniel Robbins, and Donna Andrews. Groupbar: The taskbar evolved. In
Proceedings of OZCHI, volume 3, page 10, 2003.
[51] Martin Spindler, Wolfgang B
uschel, and Raimund Dachselt. Use your head:
tangible windows for 3d information spaces in a tabletop environment. In Proceedings of the 2012 ACM international conference on Interactive tabletops and
surfaces, pages 245254. ACM, 2012.
[52] Balanced Latin Square. http://en.wikipedia.org/wiki/Latin_square.
[53] TSeries Vicon. http://www.vicon.com/System/TSeries.
[54] Manuela Waldner, Markus Steinberger, Raphael Grasset, and Dieter Schmalstieg. Importance-driven compositing window management. In Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems, pages 959
968. ACM, 2011.
[55] Robert Wang, Sylvain Paris, and Jovan Popovic. 6d hands: markerless handtracking for computer aided design. In Proceedings of the 24th annual ACM symposium on User interface software and technology, pages 549558. ACM, 2011.
[56] Frank Weichert, Daniel Bachmann, Bartholomaus Rudak, and Denis Fisseler.
Analysis of the accuracy and robustness of the leap motion controller. Sensors,
13(5):63806393, 2013.
[57] Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. Realtime performancebased facial animation. In ACM Transactions on Graphics (TOG), volume 30,
page 77. ACM, 2011.
[58] Pierre Wellner. Interacting with paper on the digitaldesk. Communications of
the ACM, 36(7):8796, 1993.
[59] Quan Xu and Gery Casiez. Push-and-pull switching: window switching based
on window overlapping. In Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems, pages 13351338. ACM, 2010.
[60] Shota Yamanaka and Homei Miyashita. Switchback cursor: mouse cursor operation for overlapped windowing. In Human-Computer InteractionINTERACT
2013, pages 746753. Springer, 2013.

51

52

Appendix A

Code snippets
A.1

Coordinate translation & mouse movement

L e a p I n t e r a c t i o n B o x * iBox = frame . i nteract ionBox ;


LeapVector * n or ma li z ed Po in t = [ iBox normaliz ePoint : leapPoint clamp :
YES ];
int appX = n or ma l iz ed Po i nt . x * screenWidth ;
int appY = n or ma l iz ed Po i nt . y * screenHeight ;
appY = screenHeight - appY ;
CGEventRef move1 = C G E v e n t C r e a t e M o u s e E v e n t ( NULL , kCGEventMouseMoved ,
CGPointMake ( appX , appY ) , k C G M o u s e B u t t o n L e f t ) ;
CGEventPost ( kCGHIDEventTap , move1 ) ;

A.2

Screen point conversion

- ( CGPoint ) c a r b o n S c r e e n P o i n t F r o m C o c o a S c r e e n P o i n t :( NSPoint ) cocoaPoint {


NSScreen * foundScreen = nil ;
CGPoint thePoint ;
for ( NSScreen * screen in [ NSScreen screens ]) {
if ( NSPointInRect ( cocoaPoint , [ screen frame ]) ) {
foundScreen = screen ;
}
}
if ( foundScreen ) {
thePoint = CGPointMake ( cocoaPoint .x , screenHeight - cocoaPoint . y
- 1) ;
} else {
thePoint = CGPointMake (0.0 , 0.0) ;
}
return thePoint ;
}

A.3

Inscreen area

if ( appY >=0&& appY <= screenHeight && appX >0 && appX <= screenWidth )
{

53

// Do rest of a l g o r i t h m
}

A.4

Oscreen area

if ( appY ==0&& yy >= LOWER_BOUND && yy <= UPPER_BOUND && appX >0 && appX <
screenWidth )
{
// Do rest of a l g o r i t h m
}

A.5

Upon grab, extract windows under cursor information

- ( void ) u p d a t e C u r r e n t U I E l e m e n t {
NSPoint cocoaPoint = [ NSEvent mouseLocation ];
if (! NSEqualPoints ( cocoaPoint , _ la st Mo u se Po i nt ) ) {
CGPoint poi ntAsCGP oint = [ self
c a r b o n S c r e e n P o i n t F r o m C o c o a S c r e e n P o i n t : cocoaPoint ];
AXUI ElementR ef newElement = NULL ;
if ( A X U I E l e m e n t C o p y E l e m e n t A t P o s i t i o n ( _systemWideElement ,
poin tAsCGPoi nt .x , poi ntAsCGP oint .y , & newElement ) ==
k AX Er ro r Su cc es s
&& newElement
&&([ self currentAppID ]!=[ self e x t r a c t P I D F r o m U I E l e m e n t :
newElement ])
&& ([ self c u r r e n t U I E le m e n t ] == NULL || ! CFEqual ( [ self
c u r r e n t U I E l e m e n t ] , newElement ) ) ) {
[ self s e t C u r r e n t U I E l e m e n t : newElement ];
[ self s et Cu r re nt Ap p ID :[ self e x t r a c t P I D F r o m U I E l e m e n t :
newElement ]];
NSImage * tmpImg =[ self generateImage :[ self getTitle ]];
[ self s et Cu r re nt Im a ge : tmpImg ];
}
_ la st Mo u se Po in t = cocoaPoint ;
}
}

A.6

Shue windows before experiment starts

- ( BOOL ) b r i n g W i n d o w W i t h N a m e T o F r o n t :( NSString *) windowName


{
CFArrayRef windowList = C G W i n d o w L i s t C o p y W i n d o w I n f o (
kCGWindowListOptionAll , kC GN u ll Wi nd o wI D ) ;
BOOL returnValue = FALSE ;
for ( N S M u t a b l e D i c t i o n a r y * entry in ( __bridge NSArray *) windowList ) {
NSString * currentWindow = [ entry objectForKey :( id )
k C G W i n d o w O w n e r N a m e ];
NSString * subWindowName = [ entry objectForKey :( id ) kCGWindowName
];
if (( currentWindow != NULL ) &&( subWindowName . length >0) && ([
currentWindow conta insStrin g : windowName ]) )

54

{
NSString * ap pl ic a ti on Na m e = [ entry objectForKey :( id )
k C G W i n d o w O w n e r N a m e ];
int pid = [[ entry objectForKey :( id ) k C G W i n d o w O w n e r P I D ]
intValue ];
N S R u n n i n g A p p l i c a t i o n * app = [ N S R u n n i n g A p p l i c a t i o n
runningApplicationWithProcessIdentifier
: pid ];
[ app a c t i v a t e W i t h O p t i o n s : N S A p p l i c a t i o n A c t i v a t e A l l W i n d o w s ];
NSString * script = [ NSString s t r i n g W i t h F o r m a t : @ " tell
application \" System Events \" to tell process \"% @ \" to
perform action \" AXRaise \" of window \"% @ \"\ ntell
application \"% @ \" to activate \ n " , applicationName ,
subWindowName , a pp li c at io nN a me ];
NSAppleScript * as = [[ NSAppleScript alloc ] in itWithSo urce :
script ];
[ as c o m p i l e A n d R e t u r n E r r o r : NULL ];
[ as e x e c u t e A n d R e t u r n E r r o r : NULL ];
returnValue = TRUE ;
}
}
CFRelease ( windowList ) ;
return returnValue ;
}

A.7

Extract the title

-( NSString *) getTitle {
CFTypeRef _title ;
AXUI ElementR ef app = A X U I E l e m e n t C r e a t e A p p l i c a t i o n ([ self currentAppID
]) ;
if ( A X U I E l e m e n t C o p y A t t r i b u t e V a l u e ( app , ( CFStringRef )
NS Acc es sib ili tyT itl eA ttr ibu te , ( CFTypeRef *) & _title ) ==
k AX Er ro r Su cc es s ) {
NSString * title = ( __bridge NSString *) _title ;
return title ;
}
return @ " " ;
}

A.8

Generate image

-( NSImage *) generateImage :( NSString *) appTitle


{
CGWindowID windowID = 0 ;
NSArray * myWindowList = ( __bridge NSArray *)
C G W i n d o w L i s t C o p y W i n d o w I n f o ( k C G Wi n d o wL i s tO p t i on O n Sc r e en O n l y ,
k CG Nu ll W in do wI D ) ;
for ( NSDictionary * info in myWindowList ) {
if ([[ info objectForKey :( NSString *) k C G W i n d o w O w n e r N a m e ]
cont ainsStri ng : appTitle ] && ![[ info objectForKey :( NSString
*) kCGWindowName ] is Eq u al To St r in g : @ " " ]) {
windowID = [[ info objectForKey :( NSString *) kC GW in d ow Nu mb e r ]
u n s i g n e d I n t V a l u e ];
}

55

}
imageOptions = k C G W i n d o w I m a g e D e f a u l t ;
singleWindowListOptions = kCGWindowListOptionIncludingWindow ;
imageBounds = CGRectNull ;
CGImageRef windowImage = C G W i n d o w L i s t C r e a t e I m a g e ( imageBounds ,
singleWindowListOptions , windowID , imageOptions ) ;
N S B i t m a p I m a g e Re p * bitmapRep = [[ N S B i t m ap I m a g e R e p alloc ]
i ni tW it h CG Im ag e : windowImage ];
NSImage * image = [[ NSImage alloc ] init ];
[ image a d d R e p r e s e n t a t i o n : bitmapRep ];
return image ;
}

A.9

Move window (Single Feedback)

-( void ) setPosition :( NSN otifica tion *) notification {


if ([ notification . object isKindOfClass :[ NSNumber class ]]) {
NSNumber * message = [ notification object ];
NSNumber * boxPos =[ NSNumber numberWithInt :( message . integerValue /
screenWidth ) * NO_OF_BOXES ];
NSNumber * moveTo =[ boxes objectForKey :[ NSNumber numberWithInt :
boxPos . integerValue ]];
NSPoint pos ;
pos . x =[ moveTo floatValue ];
pos . y = screenHeight ;
[ self . window setFrame : CGRectMake ( pos .x , pos .y , [ _myWindow frame
]. size . width , [ _myWindow frame ]. size . height ) display : YES ];
}
}

where boxes variable is an NSMutableDictionary implemented as


boxes = [ N S M u t a b l e D i c t i o n a r y dictionary ];
[ boxes setObject :[ NSNumber nu m be rW it h Fl oa t :( screenWidth / NO_OF_BOXES ) *0]
forKey :[ NSNumber numberWithInt :0]];
[ boxes setObject :[ NSNumber nu m be rW it h Fl oa t :( screenWidth / NO_OF_BOXES ) *1]
forKey :[ NSNumber numberWithInt :1]];
[ boxes setObject :[ NSNumber nu m be rW it h Fl oa t :( screenWidth / NO_OF_BOXES ) *2]
forKey :[ NSNumber numberWithInt :2]];
[ boxes setObject :[ NSNumber nu m be rW it h Fl oa t :( screenWidth / NO_OF_BOXES ) *3]
forKey :[ NSNumber numberWithInt :3]];

56

Appendix B

Demographic Information
Form

57

Off -screen interaction with LEAP


General Information & Interaction Assessment

Participant ID

_____________

General Information
Age:

Gender:

Job title:

Do you own a Macintosh?

Yes !

No !

Have you ever used a Macintosh before?

Yes !

No !

Are you using Mission Control to navigate between windows?

Yes !

No !

How many tasks do you usually perform simultaneously?

<2 ! 3-5 ! >5 !

How do you rate your experience with computers?


Highly inexperienced

Highly experienced

Appendix C

Technique Assessment Form

59

Interaction Assessment (1)


Full feedback (One guiding window, showing all areas)
1.Grabbing windows during the experiment was:
Very difficult
!
!
!
!
!
Very easy
2.Releasing windows during the experiment was:
Very difficult
!
!
!
!
!
Very easy
3.Mouse movement smoothness during experiment was:
Very rough
!
!
!
!
!
Very smooth
4.Dividing the off screen area cognitively was:
Very difficult
!
!
!
!
!
Very easy
5.Grabbing (offscreen) and releasing (inscreen) the correct window during the
experiment was:
Very difficult
!
!
!
!
!
Very easy
6.It was easy to release (offscreen) in the area i wanted.
Disagree
!
!
!
!
!
Agree
7.Arm fatigue:
None
!
!
!
!
!
Very high
8.Wrist fatigue:
None
!
!
!
!
!
Very high
9.General comfort:
Very uncomfortable
!
!
!
!
!
Very comfortable
10.Overall, the interaction was:
Very difficult to use
!
!
!
!
!
Very easy to use
Please state your own comments (advantages / disadvantages) below:

Interaction Assessment (2)


Single feedback (One guiding window, showing one areas)
1.Grabbing windows during the experiment was:
Very difficult
!
!
!
!
!
Very easy
2.Releasing windows during the experiment was:
Very difficult
!
!
!
!
!
Very easy
3.Mouse movement smoothness during experiment was:
Very rough
!
!
!
!
!
Very smooth
4.Dividing the off screen area cognitively was:
Very difficult
!
!
!
!
!
Very easy
5.Grabbing (offscreen) and releasing (inscreen) the correct window during the
experiment was:
Very difficult
!
!
!
!
!
Very easy
6.It was easy to release (offscreen) in the area i wanted.
Disagree
!
!
!
!
!
Agree
7.Arm fatigue:
None
!
!
!
!
!
Very high
8.Wrist fatigue:
None
!
!
!
!
!
Very high
9.General comfort:
Very uncomfortable
!
!
!
!
!
Very comfortable
10.Overall, the interaction was:
Very difficult to use
!
!
!
!
!
Very easy to use
Please state your own comments (advantages / disadvantages) below:

Interaction Assessment (3)


No feedback (no guiding window)
1.Grabbing windows during the experiment was:
Very difficult
!
!
!
!
!
Very easy
2.Releasing windows during the experiment was:
Very difficult
!
!
!
!
!
Very easy
3.Mouse movement smoothness during experiment was:
Very rough
!
!
!
!
!
Very smooth
4.Dividing the off screen area cognitively was:
Very difficult
!
!
!
!
!
Very easy
5.Grabbing (offscreen) and releasing (inscreen) the correct window during the
experiment was:
Very difficult
!
!
!
!
!
Very easy
6.It was easy to release (offscreen) in the area i wanted.
Disagree
!
!
!
!
!
Agree
7.Arm fatigue:
None
!
!
!
!
!
Very high
8.Wrist fatigue:
None
!
!
!
!
!
Very high
9.General comfort:
Very uncomfortable
!
!
!
!
!
Very comfortable
10.Overall, the interaction was:
Very difficult to use
!
!
!
!
!
Very easy to use
Please state your own comments (advantages / disadvantages) below:

Ranking
Please rank the aforementioned systems
Full Feedback:

_______

Single Feedback:

_______

No feedback:

_______

Additional general feedback (optional):

Você também pode gostar