FusionBrain: Research Project in Multimodal and Multitask Learning
Аннотация
FusionBrain is a research project aimed at the development of efficient multitask and multimodal models and their application to a wide variety of practical tasks. The general purpose and idea of the project is to learn to create models that can effectively extract additional important knowledge from a large number of data modalities and training tasks and, as a result, can better solve other tasks. The research is performed in many modalities: texts, images, audio, video, programming languages, graphs (e.g., molecular structures), time series, and so on. The lists of tasks to be solved is large and ranges from classical tasks in computer vision and natural language processing to tasks involving different modalities: VideoQA, Visual Commonsense Reasoning, and IQ tests (which are difficult to solve even for humans). The ability of models to solve tasks formulated in natural or visual languages and to cope with hidden tasks (for which there were no examples in the training set). Among other things, the studies focus on reduction in data and human and computational resources necessary at the training and inference stages. Some results concerning the study and development of multimodal and multitask architectures are described in this paper.
Похожие публикации
сотрудничества и партнерства