Collaborative data analysis using SWISH DataLab

Jan Wielemaker

The SWISH DataLab addresses two of the main bottlenecks of Data Science – that of bringing data from different sources together, and cleaning and selecting data that is relevant for further analysis.

SWISH unites SWI-Prolog and R together behind a web based IDE that resembles Jupyter notebooks. The platform allows multiple data scientists to work on the same data simultaneously while rule sets can be reused and shared between users. This facilitates data scientists to provide more complex data transformation steps to domain experts.

Most pipelines use a general purpose programming language such as Python to clean and ingest the data into a linked data store or RDBMS. The relevant data is then selected and appropriate machine learning is applied. In contrast, SWISH data management is based on Prolog, a relational and logic based language. External data sources, such as RDBMS, Linked Data, CSV files, XML files and JSON, are made available using a mixture of adaptors, which make the data available in Prolog’s relational model without transferring the data, and ingestion, which loads the data into Prolog. Allowing the data to be used in a unified framework without transferring this data simplifies bringing the data together.

Subsequently, declarative rules define a clean and coherent view on the data that is targeted towards analysing this data. Given the logic basis of Prolog, this view is modular, concise and declarative, making it easy to maintain. SWI-Prolog’s tabling extension provides the same termination properties as DataLog as well as the same order independence of rules within the subset Prolog shares with DataLog. Tabling also provides caching of results. At the same time, users have access to the more general Prolog language to code transformations that are not supported by DataLog. According to Wikipedia, “In recent years, Datalog has found new application in data integration, information extraction, …”. SWISH adds collaboration as well as Turing completeness to deal with transformation that Datalog is not capable of in a coherent environment.

The SWISH DataLab can be configured to allow both authenticated users and anonymous users with limited access rights.
Notebooks and programs are stored in a GIT-like repository and fully versioned.
Results can be reproduced reliably through creating a snapshot of a query and all relevant programs.
Data views defined in SWISH may be downloaded as CSV and can be accessed through a web based API.
The platform can be deployed on your laptop as well as on a server.

The SWISH DataLab provides a high-level platform to select and combine data sources in multiple workflows, while using tools that are in common usage by data analysis professionals.

Everything you need to get started with the SWISH Datalab is available as open source software:

02 December 2021
Managing/Being a Master’s Student during a Pandemic

For the past five years, Elsevier has been an enthusiastic participant in the UvA Master’s Student programme. In total, more than 45 students have been supervised by researchers across the company, which has led to 12 new recruits for our Data Science teams.
- Magdalena Mladenova
- Anita de Waard
- Thom Pijnenburg
13 September 2021
Data as a material for fashion: How treating data as a material enables a new future for design

Data Science is rapidly changing industries around the world, yet the digital transformation remains difficult for Fashion. Fashion (Design, Business, Branding, and Marketing) has never been known for maths geniuses. (There are a few, but they keep it a secret.) While maths and data may not be a given in the industry, people who work in fashion are material experts. So what would it mean if we treated data as if it were a material?
- Troy Nachtigall
29 April 2021
Programming Training for Refugees

In August 2020, VodafoneZiggo and Accenture wrapped up their three-month CodeMasters training programme for refugees. The training course was tailor-made to help refugees integrate in the Dutch labour market by teaching participants to write computer code.
- Gabriel Lopez

Collaborative data analysis using SWISH DataLab

Read More

Managing/Being a Master’s Student during a Pandemic

Data as a material for fashion: How treating data as a material enables a new future for design

Programming Training for Refugees