Kavita Ganesan
1 min readNov 27, 2018

Hi Paolo! Yes, here is a real example. In one of my projects on language classification (programming language) for 38 different languages, the training time currently is less than 5 minutes with ~20,000 files. This is only because I’m using a model that does not have long memory dependencies (e.g. Deep Learning based LSTM models). Had I used a model that takes a couple of hours to train and has expensive gpu requirements, then scaling this up to about 300 languages (which is what GitHub has) is going to be really expensive both in terms of time to train and cost to sustain.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Kavita Ganesan
Kavita Ganesan

Written by Kavita Ganesan

Chief AI Strategist & Architect | Author of The Business Case For AI | Connect: www.linkedin.com/in/kavita-ganesan/ Learn more: www.opinosis-analytics.com

Responses (2)

Write a response

@Paolo Messina yes that’s quite possible. However, it all depends on the features and the data itself. If the dataset used does not provide enough vocabulary information about a language, then its possible to confuse the classifier. So should the…

--

Thanks. Does the model performance degrade when you scale it to 300 languages? Because the precision/recall also is a factor that depends on the problem scale

--