Skip to article frontmatterSkip to article content

Technology & Software

Technology in Practice

In the past 14 years, I have worked for academic and commercial organizations to build specialized and curated search engines and discovery platforms for:

  • Educational content
  • Researchers
  • Journalists
  • Historical texts
  • Job seekers and employers

I turn state-of-the-art technologies in fields like Natural Language Processing (NLP), Machine Learning, Deep Learning into practical applications.

I emphasize the importance of the human perspective and scientific methodology:

  • Grounded evaluation: what is the best output in a specific context?
  • Data curation, manual annotation, qualitative evaluation
  • Efficient use of resources
  • Active learning

A technological solution needs to fully understand purpose and specific requirements of the users, not to provide a generic solution based on a benchmark (cf. The Benchmark Lottery).

Typical tasks are:

Information Retrieval and Search Engines

Information is key. But which information? And where is it?

  • Keyword search: which documents are most relevant for a search term?

  • Semantic search: which documents cover the most similar topics?

  • Reranking: what is the best order of the results?

Language Modelling and Language Generation

Language is full of information. But how to process it with a computer?

Classification
  • Which subject, which age category are covered by an educational web page?

  • Is a web page spam?

  • Which role is described in a job description?

  • What is the category of a document?

Sequence Labeling
  • Extract persons, place names, organizations ( named entities ) from a text.

  • Which skills are mentioned in a job description?

  • Where is the beginning and endings of a document?

Topic Modelling, Clustering, Unsupervised Learning
  • What clusters (groups) are there in a document collection?

  • What are the topics discussed in a document collection?

Experience and Technologies

Employers (Past & Current) and Education
Netherlands eScience Center
Wizenoze
Zeta Alpha
Textkernel
UKP Lab, TU Darmstadt
Institut für Deutsche Sprache
TU Darmstadt
University of Edinburgh, Speech & Language Processing
University of Heidelberg, Computational Linguistics & Applied Computer Science

My focus and main experience is with the following technologies:

Natural Language Processing (NLP) and Machine Learning
HuggingFace
Spacy
FastText
Stanford CoreNLP
DKPro
PyTorch
scikit-learn
Numpy
Search, Retrieval, Data Engineering
Elasticsearch
Weaviate
Kubernetes
Spark
Hadoop
MySQL
SQLite
Docker
AWS
Spring
Data Analysis and Visualization
Pandas
Matplotlib
Jupyter
Bokeh
Seaborn
Programming Languages
Python
Java
Scala
Perl
Bash
Mapping & GIS
GeoPy
OpenStreetMap
References
  1. Dehghani, M., Tay, Y., Gritsenko, A. A., Zhao, Z., Houlsby, N., Diaz, F., Metzler, D., & Vinyals, O. (2021). The Benchmark Lottery. arXiv. 10.48550/ARXIV.2107.07002