Han Xiao

Tencent != 10¢

  • From Internet to Wechat
  • Basically Google for China + they managed to be good at social media and messaging
  • They have a lot of data and they can probably use it, because there is no GDPR in China

Tencent AI

  • 70 AI scientists + 300 app developers (== ML Engineers)
  • The #1 in AI publications in China, yet still 1/10 of Google papers
Source: github/GNES

GNES ‒ Generic Neural Elastic Search

  • Should be used as TF, just everything deployed in a different place
  • A bit like KuberFlow/AirFlow, but optimized for Search scenario

Premises

  • Cloud native
  • Semantic search (using DNN)
  • End2end

Obstacles

  • What is the distance metric for doc vectors?
  • How to handle the difference between short / long texts (video, images)

The Idea

  • Minimum information unit × minimum semantic unit × optimum semantic unit
  • a word × a sentence × ???
  • a pixel × a 64×64 patch × ???
  • a pixel in 1 frame × a 2-3 sec shot × ???
  • What are the optima?
    • Do an experiment!
      • => determined by ONE preprocessor
  • Models change too quickly
    • => plug them in as a Docker container

Specs

  • Everything is its own microservice
  • All of the components are defined by a YAML file => immutable code, just change the YAML
Overview of possible parts
Source: github/GNES

Questions

  • Are you happy with the “All as microservice” design?
  • Do you have any idea, what is the overhead?
    • It adds overhead if the processing of each component is small
      • => great for images or better video (just FFMPeg)
  • Is the presentation somewhere?
    • Maybe
  • Can it be put into lambda functions?
    • Not for now
    • Not because of the loading

  • How to address multiple languages – when preprocessing is completely different?
    • It is your job to write a specific YAML file that preprocesses it
  • How to handle huge models in Docker images
    • Just put everything into the container
  • What is the view in China on big OpenSource contributions?
    • They do not get much support for OpenSource (a lot of papering, what to publish etc.)
    • 30 % work the relationship making
  • Challenges for Europe with regards to AI
    • A lot of Chinese AI is focused on the consumer (you can test something on this milion of data and that milion of data)
    • => not much 0 to 1 research, a lot of 1 to N research
    • Don’t try to copy B2C of US and China
    • Try to focus on the middle man, companies, which can change the world a lot, but are not “shiny” (no coverage)
      • Upgrade the traditional industry! Keep Europe great still!