NGCM Seminar: Data Science and Cloud Tools for Social Science

NGCM Seminar: Data Science and Cloud Tools for Social Science

On November 18th 2020 Dr Graham Hesketh, founder and CTO of the data science and cloud technology start-up Opsmorph, gave a talk on the transition from academia to data science for social science problems in industry.

Dr Hesketh’s presentation discussed the transition from academic scientific computing to data science in industry in three main sections. First was an anecdotal reflection on the similarities and differences between the two fields, both in terms of work content and the required technical skills. Second, the typical machine learning techniques used in social science research in industry were discussed, with example case studies presented. Finally, Dr Hesketh highlighted some access routes to careers in data science and cloud computing.

Dr Hesketh began the first section by outlining his career path – starting with an academic background of undergraduate Theoretical Physics and a PhD in Computational Modelling of Fibre Optic Communications – and eventual interest in the field of data science towards the end of his postdoctoral research position. He would pursue his interest in data science using such resources as books, video tutorials, online courses, and began starting some personal projects. Following his postdoc, he then started a data scientist role at Trilateral Research, working on technological tools for social problems in diverse areas including false information online, youth offending, and child sexual exploitation. In this role, Dr Hesketh outlined several notable similarities to his previous work in academia, including involvement in UK Innovation Research projects, partnering with academics, and writing proposals early in the project pipeline. There were also overlapping technical skills including the use of regression for fitting models to data, Fourier analysis, numerical solutions to differential equations, and using clustered computing hardware. The differences between academia and data science in industry were also discussed, namely the client-centric work content, working on multiple projects at any given time, and additional responsibilities outside of data science research.

Aside from the additional responsibilities required for a data scientist in industry, Dr Hesketh began to discuss some of the machine learning techniques that he had come across during his time at Trilateral Research, as well as some example applications and research questions. This included geospatial data processing – commonly used for investigating different area’s crime rates, suicide rates, or mental health issues rates, among others; natural language processing – typically solved with deep learning techniques and used for recognition of sexual content in online text, and automatically removing child exploitation material online; and also network analysis, whereby weighted nodes and edges representing different variables can be analysed with graph theory – this was used in research on studying the networks of youth offenders, where the nodes (representing each person) were weighted by a criminal offences score.

Dr Hesketh then also discussed some other software engineering skills he has used or experienced in the data science pipeline during his time at Trilateral Research. This includes front-end development of web-browser based user dashboards. This requires the use of such languages as HTML, JavaScript and CSS, in addition to useful data visualisation packages such as Plotly. He also discussed the use of cloud computing for a cost-effective and scalable solution to hosting web applications. He cited the main benefits being the lower requirements of server management, configuration, or security, since all of these are handled by the host platform. Cloud computing was used during Dr Hesketh’s work at Trilateral Research for developing an online dashboard for controlling air quality in offices for optimised employee performance.

Lastly, Dr Hesketh discussed potential pathways into data science careers from scientific computing backgrounds in academia. This included collaboration with industry throughout academic research, subcontracting to technology firms on data science projects, or taking on paid internships. Dr Hesketh concluded by stating that the growing demand for data science skills outstrips supply in the job market, and computational researchers can successfully prepare for such roles by using Python and machine learning techniques in their research, reading free online tutorials, and practicing building their own models.

Written by Liam Tope