Technology in Practice¶
In the past 14 years, I have worked for academic and commercial organizations to build specialized and curated search engines and discovery platforms for:
- Educational content
- Researchers
- Journalists
- Historical texts
- Job seekers and employers
I turn state-of-the-art technologies in fields like Natural Language Processing (NLP), Machine Learning, Deep Learning into practical applications.
I emphasize the importance of the human perspective and scientific methodology:
- Grounded evaluation: what is the best output in a specific context?
- Data curation, manual annotation, qualitative evaluation
- Efficient use of resources
- Active learning
A technological solution needs to fully understand purpose and specific requirements of the users, not to provide a generic solution based on a benchmark (cf. The Benchmark Lottery).
Typical tasks are:
Information Retrieval and Search Engines
Information is key. But which information? And where is it?
Keyword search: which documents are most relevant for a search term?
Semantic search: which documents cover the most similar topics?
Reranking: what is the best order of the results?
Language Modelling and Language Generation
Language is full of information. But how to process it with a computer?
Machine-readable vector representation of a text, encoder models such as BERT
Generating human-readable texts ( Natural language generation , Automatic summarization )
<wiki:Retrieval-augmented generation>(RAG)
Classification
Which subject, which age category are covered by an educational web page?
Is a web page spam?
Which role is described in a job description?
What is the category of a document?
Sequence Labeling
Extract persons, place names, organizations ( named entities ) from a text.
Which skills are mentioned in a job description?
Where is the beginning and endings of a document?
Topic Modelling, Clustering, Unsupervised Learning
What clusters (groups) are there in a document collection?
What are the topics discussed in a document collection?
Experience and Technologies¶
My focus and main experience is with the following technologies:
- Dehghani, M., Tay, Y., Gritsenko, A. A., Zhao, Z., Houlsby, N., Diaz, F., Metzler, D., & Vinyals, O. (2021). The Benchmark Lottery. arXiv. 10.48550/ARXIV.2107.07002