Samir Passi and Steven J. Jackson. 2018. Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 136 (November 2018), 28 pages.
这篇论文主要探讨企业中数据科学项目的信任问题。作者 Samir Passi 参与 DeepNetwork 的数据科学项目,并担任了其中两个商业项目的首席数据科学家。此外,他开展田野调查,采访数据科学家、项目经理、产品经理、业务分析师以及公司高管,通过采访数据结合亲身经历作为论文的实践支撑。
- 算法见证(algorithmic witnessing):通过技术手段评估模型性能。
- 责任审议(deliberative accountability):通过多个领域专家的多个视角协作评估模型。
- CSCW:Computer-supported cooperative work,计算机支持的协同工作。研究人们如何利用技术朝着共同的目标努力。
Researchers argue that “everything might be collected and connected, but that does not necessarily mean that everything can be known”. Data are never “raw”, often speaking specific forms of knowledge to power.
The idea of data-led objective discoveries entirely discounts the role of interpretive frameworks in making sense of data which are a necessary and inevitable part of interacting with the world, people and phenomena.
We describe how four common tensions in corporate data science work — (un)equivocal numbers, (counter)intuitive knowledge, (in)credible data, and (in)scrutable models — raise problems of trust, and show the practices of skepticism, assessment, and credibility by which organizational actors establish and re-negotiate trust under uncertain analytic conditions: work that is simultaneously calculative and collaborative. Highlighting the heterogeneous nature of real-world data science, we show how management and accountability of trust in applied data science work depends not only on preprocessing and quantification, but also on negotiation and translation – producing forms of what we identify as “algorithmic witnessing” and “deliberative accountability”. Trust in data science is therefore best understood as a deeply collaborative accomplishment, undertaken in the service of pragmatic ways of acting in an uncertain world.
Trust, Objectivity, and Justification
Early experimental science was thus a collective practice simultaneously social and technical: facts emerged through specific forms of sociability embedded within the experimental discourse.
As the work of interpretation shifted from the maker to the reader, scientific artifacts became open to interpretation, making consensus challenging. Agreements between multiple ways of seeing required differentiating between right and wrong interpretations: science required the correct “professional vision”. As mathematicians and physicists argued for “structural objectivity” characterized by forms of measurement and replicability, the role of “trained judgments” became salient. Scientists thus chased truth with “calibrated eyes” and standardized tool, simultaneously questioning the credibility of their findings, instruments, and knowledge.
As a form of intervening in the world, quantification necessitates its own ecologies of usability and valuation. Trust in numbers is therefore best understood as a variegated “practical accomplishment”, emanating from efforts at standardization and mechanization, along with forms of professional and institutional work.
A second line of work central to problems of trust in complex organizational setting is found in pragmatist traditions of social and organizational science. Duwey argures that instead of existing as a priori criteria, values — as perceived or assigned worth of things — are continuously negotiated within decision-making. Processes of valuation are simultaneously evaluative (how to value?) and declarative (what is valuable?).
Data science is not just interested in calculating what is, but also “aspires to calculate what is yet to come”.
In complex organization settings, data science is transected by multiple experts, interests, and goals, relying upon and feeding into a plethora of practices such as business analytics, product design, and project management. Applied data science needs not only scientists and engineers, but also managers and executives.
In this paper, we address two of these mechanisms: algorithmic witnessing, in which data scientists assess model performance by variously, most technically, reproducing models and results; and, deliberative accountability, in which multiple experts assess systems through collaborative negotiations between diverse forms of trained judgments and performance criteria.
Research Site, Methods, and Findings
During fieldwork, we began to encounter discrepancies between how different organizational actors (such as data scientists, project managers, and business analysts) articulated problems with and confidence in data and models.
Case 1 | Churn Prediction
There is stuff that you can predict and model, but it just seems unreasonable to a data scientist that you can create a model that perfectly models human behavior. The challenge with non-technical people is they think that computers can do more than they really can.
Even bad results or less than ideal results can be good — it is more than what you know now.
Certain highly-weighted features matched business intuitions, and everyone in the meeting considered that a good thing. Models that knew “nothing about business” had correctly identified certain aspects integral to business practices. Such forms of intuitive results were important not only for business analysts, but also for data scientists.
Case 2 | Special Finding
Working solutions to data-driven problems require creative mechanisms and situated discretion to work with messiness and around messiness.
During fieldwork, we saw several instances in which numbers considered sub-optimal were broken down into their constituent parts, while numbers assumed adequate or sufficiently high were often communicated and interpreted at face value.
Demarcating between algorithmic and human analytical approaches to justify perceived differences. Data Science team argued that, unlike humans, algorithms statistically traverse the uneven contours of data, producing results that may sometimes appear unrecognizable or different. Counter-intuitive finding,they argued, can at times comprise novel forms of knowledge and not model mistakes.
Data scientists argued for a trade-off between understandability and effectiveness — state-of-the-art models were not entirely inspectable. As one data scientist said, the complexity of models is not a problem but the very reason why they work — a resource for model performance instead of a topic for analytic concern. Transparency remained a problematic ideal caught between multiple interpretations of inscrutability. Opacity was often perceived as a function of models’ black-boxed nature, necessitating detailed descriptions of algorithmic workings. Even when translucent, models remained recondite — their workings were complex; their results were hard to explain. Underscoring the import and value of results in these circumstances, deemphasized complex descriptions and absent explanations. The question changed from how or why models worked to whether or how well they worked. “Implicit trust” took the place of complex descriptions. “Explicit verification” from real-world tests supplanted absent explanations.
Rather than a natural or inevitable property of data or algorithms themselves, the perceived trustworthiness of applied data science systems, as we show in this paper, is a collaborative accomplishment, emerging from the situated resolution of specific tensions through pragmatic and ongoing forms of work.
Corporate data science are inherently heterogeneous, comprised by the collaboration of diverse actors and aspirations. Project managers, product designers, and business analysts are as much a part of applied real-world corporate data science as are data scientists — the operations and relations of trust and credibility between data science and business teams are not outside the purview of data science work, but integral to its very technival operation.
Narrativization, as a form of doing, implicates data science between reality and possibility, between signal and noise — indeed, between life and data.
The incorporation of collaboration (e.g., interacting with non-data-scientists) and translation (e.g., effective communication of results) work into data science curricula and training is thus a good first step to ensure that would-be data scientists not only learn the skills to negotiate the trust in and credibility of their technical work, but also learn to see such forms of work as integral to the everyday work of data science. Or, to put it in terms of sociologists Harry Collins and Robert Evans, real-world applied data science projects require forms of both “contributory” and “interactional” expertise.
With current calls for more open documentation, corporate organizations need to document not only algorithmic functions and data variables, but also data decisions, model choices, and interim results. Organizations need to allocate additional resources and efforts to make visible and archive the seemingly mundane, yet extremely significant, decisions and imperatives in everyday data science work.
结论部分,作者回到开放科学,认为日常数据科学工作流程的大部分内容都需要文档记录。确实如此,数据分析报告、模型结果一般被认为是最终需要的内容,但是数据分析、建模的过程往往充满各种琐碎的细节和决策的依据,这些内容是数据分析报告和模型不可或缺的一部分,但在最终的结果中往往被遗漏。因此 RMarkdown 和 Quarto 的出现顺应大势,它们让分析建模的过程可选择性地展示出来,提高数据科学过程的透明度,有利于提高可信度。
- 数据科学被认为是组织知识的资产,是因为其分析能力,还是因为市场竞争的需要?
- 如果事情不顺利,数据科学团队成员会承担很大的损失,还是允许他们进行实验并犯错误?
- 在数据科学项目中,谁有最后决定权——数据科学家、项目经理、业务分析师还是业务执行人员?
- 数据科学家是否跨越业务垂直领域工作,还是被分配到特定的业务领域?