Overview: Distributed Data Usage Control
Distributed Usage Control is concerned with the problem of how to manage the usage of data once it has been given away. In that sense, it extends access control to the time after access to data, adds obligations to rights, and considers data access and usage in distributed systems. A typical example for distributed usage control is digital rights management, but the concept is far more powerful and, if applied to privacy or IP, potentially less controversial.
Our work is based on three key ideas. The first idea is that if “data” is to be protected, we usually mean all representations of that data: if we download a picture from the internet, it exists at least as network packets, as a DOM object, as a pixmap, and as a cache file. This means that usage control needs to be done in conjunction with information flow tracking.
The second idea is that these representations exist at different layers of a system stack: at the level of the operating system, at the level of a window manager, at the level of applications such as data bases, browsers, text editors, email programs, spreadsheets, and so on. We could, of course, perform usage control at the level of the operating system alone, but then we usually lose the semantics of usage actions such as “print,” “forward email,” “take screenshot, or “upload.” In addition to extending usage control with information flow concepts, our work therefore centers around the idea of enforcing usage control requirements within the different layers of the system stack, in a way that we plug in our usage control technology rather than modifying the respective layers of the system.
The third idea is that “usage control requirements” come both in declarative and operational forms because usage control requirements can be enforced in different ways. We hence distinguish between specification-level policies (“do not send non-anonymized data without notifying an admin”) and implementation-level policies (“block non-anonymized data;” “anonymize data before sending;” “notify admin when sending non-anonymized data”). Because we cater to different representations of data, we have developed a policy language that distinguishes between data items and all representations of a data item. Because we cater to different layers of the system stack, we need to provide system-level semantics to data usage actions: “save” means different things at the level of a web browser and the level of an operating system.
Dissertations
Completed dissertations address the connection between information flow control and system-wide data usage control across different levels of abstraction of a system (Lovat 2015, Fromm 2020), data usage control in distributed systems (Kelbert 2016), data usage control for privacy-aware camera surveillance (Birnstill 2016), and the derivation of machine-understandable policies from human-understandable policies (Kumari 2015). Bier's interdisciplinary work (2017) examines the negative impact of data discovery systems on privacy. Kacianka's thesis (2022) carries the ideas further to accountability in socio-technical systems. Zieglmeier (2024) relates detective enforcement to the idea of inverse transparency in the context of people analytics.
Bibliography
Usage control requirements and policies
[HBP05] formalizes the idea of usage control requirements as obligations. [HPSW06] analyzes usage control requirements in mobile contexts. [HPBSW07a] defines OSL, a language for usage control policies based on linear time temporal logics. [PW08] looks into the problem of negotiating usage control policies if an enforcement infrastructure provides mechanisms of enforcement that cannot fully enforce a given usage control policy. [PRSW09] defines and implements strategies to formally analyze usage control policies for contradictions, subsumptions, and implementation relationships between specification-level and implementation-level policies. [PSSW09] formally defines how to soundly modify policies when forwarding data items to other systems, by atmost reducing rights and increasing obligations. [K09] elicits usage control requirements for social networks. [KP12] introduces the idea of specification-level and implementation level policies and shows how to systematically derive the latter. [KP13] shows how to cater to different levels of the system stack when defining specification-level and implementation-level policies in a model-based way.
Usage control architectures
[PMH07] sketches a logical architecture for distributed usage control systems. [BABHPSZ07] develops a technical architecture and considers hardening the system on the grounds of the TPM. [LPFKMP08] introduces the idea of usage control to service-oriented architectures. [WMF13] and [K13] extend the idea to the cloud.
Usage control enforcement
[HPSW07] sketches general ideas for usage control enforcement without information flow tracking. [HPBSW07b] develops runtime monitors for OSL, the obligation specification language. [PHSSW08] provides an overview of the field of usage control enforcement, still without the idea of usage control with information flow. [PHBSW08] defines and formalizes different forms of usage control enforcement as inhibition, execution, and modification. [NPdG11] and [NPdG13] present a performance analysis of usage control monitors.
Usage control with data flow tracking
[PLB12] introduces the idea of representation-independent usage control and thus marries the idea of information flow tracking to usage control. [LOP14] introduces the idea of general quantitative data flow tracking to be incorporated into quantitative usage control policies. [LK14] shows how exploiting the structure of data can improve the precision of usage control with information flow tracking. [LFMP15] shows how to perform usage control enforcement with information flow across the layers of a system by using both static and dynamic information flow analysis tools. [LOP16] shows how to connect usage control monitors at different levels of abstraction.
Distributed usage control
[KKP11] implements a usage control enforcement infrastructure in a distributed system, using the connection of a smart meter to facebook as an example. [KP12, KP13, KP14, KP18] develop concepts and technology to generically implement usage control at the level of networks, thus giving rise to a general mechanism for data usage control with data flow tracking in distributed systems.
Implementations
[PBHSW09] implements a usage control infrastructure for the X11 window system that does cater to information flow. [HP09] implements usage control with information flow tracking for a Linux operating system on the grounds of system call interposition. [LP11] applies the ideas of multi-layer usage control with information flow to a social network application. [KPPK11] implements usage control for Mozilla Firefox. [MLP11] implements a usage control infrastructure on top of a hypervisor. [TP12] implements a usage control monitor with information flow tracking for the Windows operating system. [FP12] implements usage control for Android. [FKP12] implements distributed usage control in a smart grid. [BP13] applies distributed usage control technology to smart camera systems. [KF16] shows how to perform usage control enforcement for third-party applications in social networks. Several Bachelor's theses have implemented usage control plugins for Google Chrome, Thunderbird, Excel, mySQL, OpenOffice.
Overview papers
[PHB06] explains the basic ideas behind distributed data usage control without information flow tracking. [HPB07] applies the idea of distributed usage control (without information flow) to privacy and data protection. [P08] surveys the ideas behind distributed usage control without information flow. [P14] extends the idea of usage control to accountability infrastructures.
The Road Ahead
This line of research has spawned many further research activities. The work on runtime usage control monitors, specifically the development of a monitor for both API calls and quantitative data flows in the Windows operating system, turned out to be extremely useful for malware detection, as spelt out in Wüchner's 2016 dissertation on behavior-based malware detection with quantitative data flow graphs. This, in turn, led to a careful investigation and some remedies to the problem of labeling data for malware detection with machine learning, discussed in Salem's 2021 thesis.
In addition, it was clear from the beginning that usage control enforcement can and should happen both in a detective (as in speed trap) and a preventive (as in DRM) manner. Technically speaking, both variants are very similar. It quickly turned out, however, that preventive enforcement mechanisms (and most often, inhibiting mechanisms) led to system crashes, because software at all levels of the system stack is not always programmed in a defensive way. For instance, if we use system call interposition to make system calls return error codes to implement the inhibition of some action, the application that issued the system call is likely to crash. Moreover, users generally don't like to be inhibited, and it is a common observation that this may lead to users circumventing security mechanisms. We have thus focused on detective approaches and quickly found them to be part of what people call accountability infrastructures. Accountability is deeply intertwined with causality, which Ibrahim has studied in his 2021 thesis on Halpern-Pearl actual causality. Kacianka's 2022 thesis has tackled the problem of accountability from a system-wide perspective: what does it mean for a system to be accountable, and how can we ensure this?
Finally, distributed data usage control infrastructures need to be secured - if it is too easy to circumvent them, their value is questionable. We have hence focused on software-based software integrity protection for hardening these infrastructures, but the results apply to software in general. Ahmadvand's 2021 thesis has looked into new software-based integrity protections, combinations of integrity protection, and specifically, the assessment of how "good" the protection is. Integrity protection mechanisms must be protected themselves. One software-based way is obfuscation, which Banescu has studied in his 2017 thesis.
Currently, we are extending distributed usage control technology and concepts to the work place and apply them to the concept of Inverse Transparency.