NCA-GENM 無料問題集「NVIDIA Generative AI Multimodal」
You are building an A1 model that takes video and corresponding subtitles as input to generate short summaries of video content. Which of the following strategies are most important to reduce the chance of your model generating biased summaries? (Select all that apply)
正解:B、C、E
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are training a conditional generative model to generate images based on text descriptions. You notice that the generated images often lack fine-grained details and tend to be blurry, even though the overall structure matches the text description. Which of the following techniques would be MOST effective in improving the image quality and adding finer details?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are building a multimodal model to generate realistic dialogues between virtual characters in a game. The model takes as input the current game state (including character positions, objects, and environment), the character's personality profile (text), and the previous dialogue utterances (text and audio). What specific techniques can you employ to ensure that the generated dialogues are contextually relevant, coherent, and emotionally appropriate?
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You're using NVIDIA Triton to serve a multimodal model: a CLIP text encoder and a StyleGAN image generator. You need to ensure high throughput and minimal latency. Which Triton backend configuration is most suitable for this scenario, assuming both models are optimized for NVIDIA GPUs?
正解:D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
Consider a scenario where you're training a generative A1 model to create realistic images from text descriptions. You notice that the generated images lack fine-grained details and appear blurry. Which of the following loss functions or training techniques could you employ to improve the image quality and sharpness?
正解:E
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are analyzing a dataset of customer reviews for a new product using Natural Language Processing (NLP). The dataset contains both positive and negative reviews, but a significant portion of the negative reviews uses sarcasm. Which of the following NLP techniques would be MOST effective in accurately identifying the sentiment expressed in sarcastic reviews?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are tasked with optimizing a U-Net model for real-time image segmentation on an embedded device with limited GPU memory. The original model is trained in FP32 precision. Which of the following techniques, when applied together, would likely yield the best trade-off between accuracy and performance?
正解:D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are tasked with creating a multimodal AI application that analyzes social media posts containing text, images, and user profile information to predict the likelihood of a post going viral. Which feature engineering techniques are most effective for representing and integrating these different modalities?
正解:E
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
Consider the following Python code snippet used for processing image and text data for a multimodal model:

What is the primary limitation of the text encoding method used in this code, and how could it be improved for use in a real-world multimodal model?

What is the primary limitation of the text encoding method used in this code, and how could it be improved for use in a real-world multimodal model?
正解:B
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are developing a system to automatically generate image descriptions for visually impaired users. The system uses a combination of object detection, attribute recognition, and relationship extraction. However, the generated descriptions often lack detail and fail to capture the nuances of the image content. Which of the following strategies would MOST effectively address this limitation?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You're training a multimodal Generative A1 model that takes video and text as input to predict future frames of the video. You notice that the model generates plausible visual content but often fails to accurately reflect the actions described in the text. Which of the following techniques is MOST likely to improve the alignment between the generated video and the text description?
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)