Data science consultancy

I’m currently available for short-term, part-time consultancy projects!

What I do

AI systems are only as good as their input data. While NLP and AI systems are becoming more common, high-quality input data remains scarce. I’d like to help people solve this problem by using my experience in the intersection of NLP, data science and data engineering.

Creating datasets is something I like doing and something I’m good at. I want to make sure that people with a research question have the right data to answer that question.

Things I can do for you:

  • determining whether a dataset is a good fit to answer a particular research question
  • creating new datasets
  • finding existing datasets
  • webscraping
  • tagging and annotation
  • data cleaning, filtering and preprocessing
  • data exploration

My specialism is working with textual data, such as dialogue transcriptions, product reviews, ads, vacancies, news articles, book texts, blog posts or emails.

If you need help with any of the above, tell me and we’ll schedule a skype-meeting to talk about the possibilities. :) Please note that, because of the corona-virus, I am only available for remote work.

What happens if we schedule a meeting?

I can do a lot of different things: I can code, I can gather, combine or clean datasets, and I can use my expertise to advise on project details. It all depends on what you need.

Some of the questions I will ask you during our first Skype meeting:

  • what do I need to know about you, your company or organisation, your expertise, your project, etc?
  • what is the question that you are trying to answer with data?
  • which data do you already have?
  • which data is still missing or unavailable?
  • how will you use the data to answer your question?
  • do you need advice, software, a dataset, or maybe something else?

Depending on the answers to these questions, we create a project description, budget and planning together to solve your problem.

Contact page

Examples of past work

I’ve written blogposts about some of my projects. I think the post ‘Harry Potter and the Technical Twitter Meme’ gives a good impression of how I work in terms of data gathering, cleaning and analysis.

You can view some of my code on GitHub.

Most of the research projects for my PhD involve some kind of dataset creating or data preprocessing. All of my research papers are peer-reviewed and open access. You can find full-text links on my Publications page.