{"id":755,"date":"2021-02-18T15:48:00","date_gmt":"2021-02-18T15:48:00","guid":{"rendered":"https:\/\/resources.illc.uva.nl\/illc-blog\/?p=755"},"modified":"2021-03-17T15:38:45","modified_gmt":"2021-03-17T15:38:45","slug":"machines-that-gaze-at-landscapes","status":"publish","type":"post","link":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/","title":{"rendered":"Machines that gaze at landscapes"},"content":{"rendered":"\n<p class=\"has-small-font-size\">18 February 2021,\u00a0<a rel=\"noreferrer noopener\" href=\"https:\/\/twitter.com\/IrisProff\" target=\"_blank\">Iris Proff<\/a><\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/sheep.gif\" alt=\"\" class=\"wp-image-764\" width=\"610\" height=\"457\"\/><figcaption>Image: <a rel=\"noreferrer noopener\" href=\"https:\/\/cocodataset.org\/\" target=\"_blank\">COCO dataset<\/a>, \u00a9 2015, COCO Consortium, eye-tracking data: <a rel=\"noreferrer noopener\" href=\"https:\/\/didec.uvt.nl\/\" target=\"_blank\">DIDEC dataset<\/a>, \u00a9 DIDEC team.<\/figcaption><\/figure>\n\n\n\n<p>Look at the picture above \u2013 what do you see? When this task was given to participants in an<a href=\"https:\/\/www.aclweb.org\/anthology\/C18-1310\/\" target=\"_blank\" rel=\"noreferrer noopener\"> experiment at the University of Tilburg<\/a>, they came up with these descriptions:&nbsp;<\/p>\n\n\n\n<p>&nbsp;\u201cUhm, a lot of sheep.\u201d&nbsp;<\/p>\n\n\n\n<p>&nbsp;\u201cA large group of sheep and goats that, uhm, are led by a donkey and a man.\u201d&nbsp;<\/p>\n\n\n\n<p>\u201cA landscape with mountains and some houses and a flock of sheep in the foreground.&#8221;<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignright size-medium\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"188\" src=\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/man_and_dog-300x188.png\" alt=\"\" class=\"wp-image-768\" srcset=\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/man_and_dog-300x188.png 300w, https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/man_and_dog-1024x640.png 1024w, https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/man_and_dog-768x480.png 768w, https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/man_and_dog.png 1040w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption>When participants look at the dog first, they are biased to say \u201ca dog chasing a man\u201d. When they look at the man first, they are more likely to say \u201ca man being chased by a dog\u201d or \u201ca man is running away from a dog\u201d. Modified from <a href=\"https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S0749596X07000186\" target=\"_blank\" rel=\"noreferrer noopener\">Gleitman et al. 2007<\/a>.<\/figcaption><\/figure><\/div>\n\n\n\n<p>Each of these descriptions is unique and focuses on different aspects of the picture \u2013 yet they are all accurate. What might determine which description a person chooses? Research suggests that the way we look at an image <a href=\"https:\/\/www.aclweb.org\/anthology\/C18-1310\/\" target=\"_blank\" rel=\"noreferrer noopener\">influences how we perceive and describe it<\/a>.<\/p>\n\n\n\n<p>When participants in a <a rel=\"noreferrer noopener\" href=\"https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S0749596X07000186\" target=\"_blank\">study from 2007<\/a> described pictures showing two agents \u2013 such as a dog chasing a man or two people shaking hands \u2013 their descriptions were more likely to start with the agent they initially looked at. By manipulating where participants looked first, the experimenters could influence what they would say.<\/p>\n\n\n\n<p>When we describe what we see, the seemingly disparate cognitive processes of language production and vision intertwine. But how exactly seeing and speaking align is not known. To address this question, the <a rel=\"noreferrer noopener\" href=\"https:\/\/dmg-illc.github.io\/dmg\/\" target=\"_blank\">Dialogue Modelling Group<\/a> at the ILLC, led by Raquel Fern\u00e1ndez, builds observations from human behavior into algorithms that automatically generate descriptions for images. This leads not only to more human-like automatic descriptions, but also to a deeper understanding of how vision and language may interact in the human brain.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Automatic image captioning<\/strong><\/h2>\n\n\n\n<p>Automatic image captioning is an interesting task for the AI research community, as it combines the two prime applications of neural networks: computer vision and natural language processing.&nbsp; It poses a challenge that goes well beyond the typical dog-or-cat image classification. It tests the computer\u2019s ability to judge what an image <em>essentially<\/em> shows &#8211; what is important and what is not. An easy task for humans, but a very hard one for machines.<\/p>\n\n\n\n<p>A state-of-the-art image captioning system&nbsp; such as the one proposed by <a href=\"https:\/\/openaccess.thecvf.com\/content_cvpr_2018\/html\/Anderson_Bottom-Up_and_Top-Down_CVPR_2018_paper.html\">Anderson and colleagues in 2018<\/a> works as follows. First, a neural network recognizes individual objects: a sheep, another sheep, a man, a mountain. Second, an attention component determines which of these objects are most relevant to the description. Inspired by visual processing in humans, the algorithm draws attention to areas with salient visual features such as high contrast or bright colors, and to areas containing objects of interest like faces or written letters. <\/p>\n\n\n\n<p>Finally, a language model produces a sentence that describes these objects and how they relate to each other. This model aims to mimic how humans describe images and the visual world in general. However, while humans look at different parts of an image as they are describing it &nbsp;\u2013 sequentially, one object at a time \u2013 the computer model is fed all visual information at once.&nbsp;<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"727\" height=\"554\" src=\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/anderson2018.png\" alt=\"\" class=\"wp-image-773\" srcset=\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/anderson2018.png 727w, https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/anderson2018-300x229.png 300w\" sizes=\"auto, (max-width: 727px) 100vw, 727px\" \/><figcaption>Object recognition in a visual scene. Picture taken from <a href=\"https:\/\/openaccess.thecvf.com\/content_cvpr_2018\/papers\/Anderson_Bottom-Up_and_Top-Down_CVPR_2018_paper.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Anderson et al. 2018<\/a>.<\/figcaption><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Models that gaze at pictures<\/strong><\/h2>\n\n\n\n<p>To overcome this limitation, Ece Takmaz, Sandro Pezzelle and Raquel Fern\u00e1ndez at the ILLC and Lisa Beinborn at the VU Amsterdam e<a rel=\"noreferrer noopener\" href=\"https:\/\/www.aclweb.org\/anthology\/2020.emnlp-main.377.pdf\" target=\"_blank\">xpanded the state-of-the-art image captioning model<\/a>. They made it look at pictures like humans do &#8211; in sequence. \u201cWe take inspiration from cognitive science to make our models more human-like\u201d, says Ece. <\/p>\n\n\n\n<p>When learning a picture-description pair, image captioning models receive the words a participant uttered one by one. Along with each word, the expanded model only sees the section of the image the participant fixated at that moment &#8211; like a human and unlike the original model, which sees the whole image at once.&nbsp;<\/p>\n\n\n\n<p>&nbsp;The researchers trained and tested their model on the <a rel=\"noreferrer noopener\" href=\"https:\/\/didec.uvt.nl\/\" target=\"_blank\">Dutch Image Description and Eye-tracking Corpus<\/a>, a dataset collected at the University of Tilburg, that contains images, spoken descriptions and eye-tracking data.<\/p>\n\n\n\n<p>Does this novel method come up with better descriptions? Notably, for algorithms that generate language it is not obvious what \u201cbetter\u201d means. There is not a single correct description for an image, just as there is not a single correct short story, text summary or email reply. To measure how well such models perform, researchers usually measure how similar the model output is to the output of a human doing the same task. <\/p>\n\n\n\n<p>According to such a metric, the gaze-driven model performs better than the model by Anderson and colleagues. That is, it generates more human-like image descriptions, regarding both content and structure.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Human-like image descriptions<\/strong><\/h2>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignright size-medium\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"225\" src=\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/713684-300x225.jpg\" alt=\"\" class=\"wp-image-769\" srcset=\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/713684-300x225.jpg 300w, https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/713684-1024x768.jpg 1024w, https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/713684-768x576.jpg 768w, https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/713684-1200x900.jpg 1200w, https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/713684.jpg 1267w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption>&#8220;A red bus and a bus\u201d. The gaze model is able to compress information: \u201cTwo parked busses\u201d. \u00a9 2015, COCO Consortium<\/figcaption><\/figure><\/div>\n\n\n\n<p>What is it that makes the model more human-like? The researchers identify three key features. First, the model repeats less. \u201cRepetition is something that is very difficult to get rid of in language generation models in general\u201d says Raquel Fern\u00e1ndez. \u201cModels come up with descriptions like \u2018a man eating a pizza pizza pizza.\u2019\u201d The gaze-driven model tends to reduce such unnecessary repetitions.<\/p>\n\n\n\n<p>Second, the gaze-driven model has a larger vocabulary and uses more specific words. When people focus on particular details in a picture, the model is likely to incorporate those in the description. \u201cStill, our model lacks behind the rich vocabulary used by human speakers\u201d says Sandro Pezzelle.&nbsp;<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignright size-medium\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"225\" src=\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/1591909-300x225.jpg\" alt=\"\" class=\"wp-image-770\" srcset=\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/1591909-300x225.jpg 300w, https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/1591909-1024x768.jpg 1024w, https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/1591909-768x576.jpg 768w, https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/1591909-1200x900.jpg 1200w, https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/1591909.jpg 1267w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption>The model without gaze data describes this picture as: \u201ca picture of a street with some birds\u201d. The gaze model captures uncertainty in the picture: \u201ca photo of a street, uhm, with some birds\u201d. \u00a9 2015, COCO Consortium<\/figcaption><\/figure><\/div>\n\n\n\n<p>Finally, the gaze-driven model captures speech disfluencies: \u201cuhms\u201d, hesitations and corrections that speakers make. The picture on the right shows a particularly hard to describe scene, which is reflected not only in people\u2019s descriptions, but also in their gaze patterns. The gaze-driven model captures this difficulty and comes up with descriptions such as \u201ca photo of a street, uhm, with some birds\u201d, but also \u201cuhm, uhm, uhm, uhm and some birds\u201d. Interestingly, people seem to perceive artificial dialogue systems using filler words and corrections as <a rel=\"noreferrer noopener\" href=\"https:\/\/www.aclweb.org\/anthology\/W10-4301.pdf\" target=\"_blank\">more pleasant to interact with<\/a> than systems that stop speaking whenever they encounter an uncertainty.&nbsp;<\/p>\n\n\n\n<p>Descriptions that are more human-like are not necessarily more accurate or better suited for practical applications of image captioning. \u201cWe are not trying to produce the most efficient image caption\u201d, says Raquel Fern\u00e1ndez. \u201cWe rather try to reproduce the process by which humans describe images.\u201d <\/p>\n\n\n\n<p>Her team\u2019s work confirms that vision and speech production align and that this alignment is not simple \u2013 often people look at things which they don\u2019t (immediately) mention. The research also shows that incorporating multiple modalities and cognitive signals&nbsp;into AI tools is a promising pathway to make them better.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The important choice of language<\/strong><\/h2>\n\n\n\n<p>\u201cThe long-term goal of this type of research is not just to describe photographs, but to connect the extra-linguistic world to language\u201d, says Raquel Fern\u00e1ndez. It could be the starting point in the development of advanced tools for visually impaired people, or intelligent dialogue systems that assist you while driving a car.<\/p>\n\n\n\n<p>When it comes to such practical applications, it is crucial that these are available in many languages. But the vast majority of research in the field is conducted on English data. The Dutch language dataset chosen by the researchers at the ILLC thus posed a challenge, as models pre-trained on Dutch language are rare. However, this seems to be slowly changing as natural language processing research expands to more languages, the scientists point out.<\/p>\n\n\n\n<p>Also from a scientific perspective, research into different languages is valuable. \u201cIn Turkish \u2013 my mother tongue \u2013 the order of words is very flexible. We can shuffle things around\u201d, says Ece Takmaz. \u201cMaybe the alignment between vision and language proceeds in a different way here.\u201d In the near future, the researchers plan to investigate such cross-linguistic differences.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>What\u2019s on this image? Neural networks are great at recognizing objects \u2013 describing whole scenes is much more difficult. Research at the ILLC shows that it might help to let machines look at images like humans do.<\/p>\n","protected":false},"author":2,"featured_media":767,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15],"tags":[],"class_list":["post-755","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-natural-language-processing"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.9 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Machines that gaze at landscapes - ILLC Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Machines that gaze at landscapes - ILLC Blog\" \/>\n<meta property=\"og:description\" content=\"What\u2019s on this image? Neural networks are great at recognizing objects \u2013 describing whole scenes is much more difficult. Research at the ILLC shows that it might help to let machines look at images like humans do.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/\" \/>\n<meta property=\"og:site_name\" content=\"ILLC Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-02-18T15:48:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-03-17T15:38:45+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/orig-e1613741136371.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1007\" \/>\n\t<meta property=\"og:image:height\" content=\"842\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/\"},\"author\":{\"name\":\"admin\",\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/#\/schema\/person\/e6a636d976565eaec71567ead9a5e70a\"},\"headline\":\"Machines that gaze at landscapes\",\"datePublished\":\"2021-02-18T15:48:00+00:00\",\"dateModified\":\"2021-03-17T15:38:45+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/\"},\"wordCount\":1366,\"publisher\":{\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/orig-e1613741136371.png\",\"articleSection\":[\"Natural Language Processing\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/\",\"url\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/\",\"name\":\"Machines that gaze at landscapes - ILLC Blog\",\"isPartOf\":{\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/orig-e1613741136371.png\",\"datePublished\":\"2021-02-18T15:48:00+00:00\",\"dateModified\":\"2021-03-17T15:38:45+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#primaryimage\",\"url\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/orig-e1613741136371.png\",\"contentUrl\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/orig-e1613741136371.png\",\"width\":1007,\"height\":842},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Machines that gaze at landscapes\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/#website\",\"url\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/\",\"name\":\"ILLC Blog\",\"description\":\"Institute for Logic, Language and Computation\",\"publisher\":{\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/#organization\",\"name\":\"ILLC Blog\",\"url\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2020\/04\/logo-uva.png\",\"contentUrl\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2020\/04\/logo-uva.png\",\"width\":301,\"height\":30,\"caption\":\"ILLC Blog\"},\"image\":{\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/#\/schema\/person\/e6a636d976565eaec71567ead9a5e70a\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9da35c76f4cff342883f387bb36fd693eeeceb9d4c368ae306f833f4b54db32e?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9da35c76f4cff342883f387bb36fd693eeeceb9d4c368ae306f833f4b54db32e?s=96&d=mm&r=g\",\"caption\":\"admin\"},\"url\":\"https:\/\/resources.illc.uva.nl\/illc-blog\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Machines that gaze at landscapes - ILLC Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/","og_locale":"en_US","og_type":"article","og_title":"Machines that gaze at landscapes - ILLC Blog","og_description":"What\u2019s on this image? Neural networks are great at recognizing objects \u2013 describing whole scenes is much more difficult. Research at the ILLC shows that it might help to let machines look at images like humans do.","og_url":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/","og_site_name":"ILLC Blog","article_published_time":"2021-02-18T15:48:00+00:00","article_modified_time":"2021-03-17T15:38:45+00:00","og_image":[{"width":1007,"height":842,"url":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/orig-e1613741136371.png","type":"image\/png"}],"author":"admin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#article","isPartOf":{"@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/"},"author":{"name":"admin","@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/#\/schema\/person\/e6a636d976565eaec71567ead9a5e70a"},"headline":"Machines that gaze at landscapes","datePublished":"2021-02-18T15:48:00+00:00","dateModified":"2021-03-17T15:38:45+00:00","mainEntityOfPage":{"@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/"},"wordCount":1366,"publisher":{"@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/#organization"},"image":{"@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#primaryimage"},"thumbnailUrl":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/orig-e1613741136371.png","articleSection":["Natural Language Processing"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/","url":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/","name":"Machines that gaze at landscapes - ILLC Blog","isPartOf":{"@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#primaryimage"},"image":{"@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#primaryimage"},"thumbnailUrl":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/orig-e1613741136371.png","datePublished":"2021-02-18T15:48:00+00:00","dateModified":"2021-03-17T15:38:45+00:00","breadcrumb":{"@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#primaryimage","url":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/orig-e1613741136371.png","contentUrl":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2021\/02\/orig-e1613741136371.png","width":1007,"height":842},{"@type":"BreadcrumbList","@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/machines-that-gaze-at-landscapes\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/resources.illc.uva.nl\/illc-blog\/"},{"@type":"ListItem","position":2,"name":"Machines that gaze at landscapes"}]},{"@type":"WebSite","@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/#website","url":"https:\/\/resources.illc.uva.nl\/illc-blog\/","name":"ILLC Blog","description":"Institute for Logic, Language and Computation","publisher":{"@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/resources.illc.uva.nl\/illc-blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/#organization","name":"ILLC Blog","url":"https:\/\/resources.illc.uva.nl\/illc-blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/#\/schema\/logo\/image\/","url":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2020\/04\/logo-uva.png","contentUrl":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-content\/uploads\/2020\/04\/logo-uva.png","width":301,"height":30,"caption":"ILLC Blog"},"image":{"@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/#\/schema\/person\/e6a636d976565eaec71567ead9a5e70a","name":"admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/resources.illc.uva.nl\/illc-blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/9da35c76f4cff342883f387bb36fd693eeeceb9d4c368ae306f833f4b54db32e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9da35c76f4cff342883f387bb36fd693eeeceb9d4c368ae306f833f4b54db32e?s=96&d=mm&r=g","caption":"admin"},"url":"https:\/\/resources.illc.uva.nl\/illc-blog\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-json\/wp\/v2\/posts\/755","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-json\/wp\/v2\/comments?post=755"}],"version-history":[{"count":9,"href":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-json\/wp\/v2\/posts\/755\/revisions"}],"predecessor-version":[{"id":791,"href":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-json\/wp\/v2\/posts\/755\/revisions\/791"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-json\/wp\/v2\/media\/767"}],"wp:attachment":[{"href":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-json\/wp\/v2\/media?parent=755"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-json\/wp\/v2\/categories?post=755"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/resources.illc.uva.nl\/illc-blog\/wp-json\/wp\/v2\/tags?post=755"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}