Posts
All the articles I've posted.
What we learned by accelerating by 5X Hugging Face generative language models
Posted on:February 9, 20222 trends ongoing in the NLP ecosystem: bigger language model and better text generation. Both are NLP game changers (zero shot, etc.) but they bring their own challenges: how to perform inference with them? At what co…
4.5 times faster Hugging Face transformer inference by modifying some Python AST
Posted on:December 29, 2021Recently, 🤗 Hugging Face people have released a commercial product called Infinity to perform inference with very high performance (aka very fast compared to Pytorch + FastAPI deployment). Unfortunately it’s a paid p…
1st ever method to perform *GPU* quantization on most 🤗 HF transformer models: > 2X faster inference!
Posted on:December 10, 2021Quantization is a technique to significantly accelerate inference by replacing high precision tensors by lower precision representation in a way where accuracy is kept intact (or close to). It’s quite common in CPU in…
Python library to optimize Hugging Face transformer for inference: < 0.5 ms latency / 2850 infer/sec
Posted on:November 24, 2021We just launched a new open source Python library to help in optimizing Transformer model inference and prepare deployment in production. It’s a follow up of a proof of concept shared . Scripts have been conve…
Optimization of Hugging Face Transformer models to get Inference < 1 Millisecond Latency + deployment on production ready inference server
Posted on:November 5, 2021Hi, I just released a project showing how to optimize big NLP models and deploy them on Nvidia Triton inference server.
Hugging Face Transformer Inference Under 1 Millisecond Latency
Posted on:November 5, 2021Go to production with Microsoft and Nvidia open source tooling
Divide Hugging Face Transformers training time by 2 or more with dynamic padding and uniform length batching
Posted on:May 20, 2020Reducing training time helps to iterate more in a fixed budget time and thus achieve better results.
fastrtext — fastText for R, without the papercuts
Posted on:February 15, 2020An R wrapper around Facebook's fastText library for swift text classification and word vectors.
Pushing open data from inside a legal publisher (2019): two pro bono partnerships in France & Luxembourg
Posted on:January 15, 2020In 2019 we ran two pro bono partnerships to open up court decisions - with Etalab (the French government’s open data unit, within DINUM) and the Cour de cassation (France’s supreme court), and with Luxembourg’s Prosecutor General - focusing on engineering speedups for anonymization and an end-to-end PoC.
NER algo benchmark: spaCy, Flair, m-BERT and camemBERT on anonymizing French commercial legal cases
Posted on:December 10, 2019Does (model) size matters?