次の認定試験に速く合格する！

簡単に認定試験を準備し、学び、そして合格するためにすべてが必要だ。

会員センターカート (0)

NCA-GENM 無料問題集「NVIDIA Generative AI Multimodal」

ページ: 1 / 31
トータル 403 問

サインアップ、ログインされた後に、試験全体を無料で表示できるようになります。

質問 1

Which of the following regularization techniques is MOST effective for preventing overfitting in a multimodal deep learning model with a large number of parameters and complex interactions between different modalities?

（A）L2 regularization

（B）Early Stopping

（C）L1 regularization

（D）Batch Normalization

（E）Dropout

正解：E 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 2

You are building a system to generate captions for images. You want to evaluate how well the generated captions describe the content of the images. Which of the following metrics are most suitable for evaluating the quality of image captions?

（A）F 1-Score

（B）Pixel Accuracy.

（C）ROUGE (Recall-Oriented Understudy for Gisting Evaluation).

（D）BLEU (Bilingual Evaluation Understudy).

（E）Inception Score.

正解：C、D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 3

You are building an A1 model that takes video and corresponding subtitles as input to generate short summaries of video content. Which of the following strategies are most important to reduce the chance of your model generating biased summaries? (Select all that apply)

（A）Randomly shuffle data during training.

（B）Ensure the training dataset contains diverse representation of all demographic groups and viewpoints.

（C）Use a pre-trained language model that has been debiased.

（D）Increase the number of training epochs.

（E）Evaluate the model's summaries on different demographic groups to identify and mitigate any disparities in performance.

正解：B、C、E 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 4

You are training a conditional generative model to generate images based on text descriptions. You notice that the generated images often lack fine-grained details and tend to be blurry, even though the overall structure matches the text description. Which of the following techniques would be MOST effective in improving the image quality and adding finer details?

（A）Implement a perceptual loss function that compares high-level features of generated and real images.

（B）Increase the batch size used for training.

（C）Use a simpler generator architecture.

（D）Train the model for fewer epochs.

（E）Decrease the learning rate of the discriminator.

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 5

You are building a multimodal model to generate realistic dialogues between virtual characters in a game. The model takes as input the current game state (including character positions, objects, and environment), the character's personality profile (text), and the previous dialogue utterances (text and audio). What specific techniques can you employ to ensure that the generated dialogues are contextually relevant, coherent, and emotionally appropriate?

（A）Use reinforcement learning to train the model to maximize a reward function that reflects the desired dialogue characteristics (e.g., coherence, emotional appropriateness).

（B）Train each mode separately to achieve the best result and them merge at the end.

（C）All of the above. Except D

（D）Implement a hierarchical dialogue generation architecture that first plans the overall dialogue structure and then generates individual utterances.

（E）Incorporate attention mechanisms that allow the model to selectively focus on the most relevant aspects of the game state and character personality profile.

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 6

You're using NVIDIA Triton to serve a multimodal model: a CLIP text encoder and a StyleGAN image generator. You need to ensure high throughput and minimal latency. Which Triton backend configuration is most suitable for this scenario, assuming both models are optimized for NVIDIA GPUs?

（A）Using just the Python backend with the models on CPU.

（B）A Python backend where both models are loaded into memory and inference is performed sequentially.

（C）A single model repository containing both models as TorchScript, served by a single Triton instance using the PyTorch backend.

（D）A single model repository with two model instances (CLIP as ONNX, StyleGAN as TensorRT) served by a single Triton instance, leveraging concurrent execution.

（E）Two separate model repositories, one for CLIP (as ONNX) and one for StyleGAN (as TensorRT), served by two Triton instances on different GPUs.

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 7

Consider a scenario where you're training a generative A1 model to create realistic images from text descriptions. You notice that the generated images lack fine-grained details and appear blurry. Which of the following loss functions or training techniques could you employ to improve the image quality and sharpness?

（A）Cross-entropy loss between the generated image and the text description.

（B）Increasing the batch size during training to improve gradient estimation.

（C）L1 loss between the generated image and the target image.

（D）Mean Squared Error (MSE) loss between the generated image and a downscaled version of the target image.

（E）Perceptual loss, which compares the feature representations of the generated and target images in a pre-trained CNN.

正解：E 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 8

You are analyzing a dataset of customer reviews for a new product using Natural Language Processing (NLP). The dataset contains both positive and negative reviews, but a significant portion of the negative reviews uses sarcasm. Which of the following NLP techniques would be MOST effective in accurately identifying the sentiment expressed in sarcastic reviews?

（A）Fine-tuning a pre-trained transformer model (e.g., BERT, RoBERTa) on a dataset of sarcastic and non-sarcastic reviews.

（B）Using a rule-based system that identifies keywords associated with positive or negative sentiment.

（C）Bag-of-words model with TF-IDF weighting.

（D）Sentiment lexicon-based approach.

（E）Calculating the average word embedding for each review.

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 9

You are tasked with optimizing a U-Net model for real-time image segmentation on an embedded device with limited GPU memory. The original model is trained in FP32 precision. Which of the following techniques, when applied together, would likely yield the best trade-off between accuracy and performance?

（A）Weight clustering to reduce model size, pruning low-importance connections, and using a larger learning rate during fine-tuning.

（B）FP16 mixed-precision training, layer fusion to combine multiple operations into one, and increasing the batch size to improve GPU utilization.

（C）Converting all layers to FP16, removing skip connections from the IJ-Net architecture, and using a smaller input image resolution.

（D）Quantization Aware Training (QAT) to INT8, Knowledge Distillation from the FP32 model to a smaller student model, and channel pruning to reduce the number of filters.

（E）Applying standard post-training quantization to INT8, replacing convolutional layers with fully connected layers, and using a smaller batch size.

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 10

You are tasked with creating a multimodal AI application that analyzes social media posts containing text, images, and user profile information to predict the likelihood of a post going viral. Which feature engineering techniques are most effective for representing and integrating these different modalities?

（A）Using TF-IDF for text, pixel values for images, and one-hot encoding for user profile information.

（B）Using a combination of TF-IDF for text, pixel values for images, and numerical features for user profile information. Then apply PCA for dimensionality reduction.

（C）Using character-level n-grams for text, edge detection for images, and boole an features for user profile information.

（D）Using bag-of-words for text, histogram of oriented gradients (HOG) for images, and simple numerical features (e.g., number of followers) for user profiles.

（E）Using word embeddings (e.g., Word2Vec, GloVe) for text, pre-trained CNN features (e.g., from ResNet, Inception) for images, and embedding user profiles using a graph embedding technique.

正解：E 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 11

Consider the following Python code snippet used for processing image and text data for a multimodal model:

What is the primary limitation of the text encoding method used in this code, and how could it be improved for use in a real-world multimodal model?

（A）The text encoding is overly complex and should be simplified to reduce computational overhead.

（B）The text encoding only supports ASCII characters and does not account for word embeddings or sequence length variations. Use a tokenizer like BERT or SentencePiece to generate embeddings and pad sequences to a fixed length

（C）The text encoding is suitable for small datasets but will not scale to larger datasets.

（D）It adequately addresses the complexities inherent in natural language, making it suitable for a variety of multimodal models.

（E）The text encoding is efficient but incompatible with common deep learning architectures.

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 12

You are developing a system to automatically generate image descriptions for visually impaired users. The system uses a combination of object detection, attribute recognition, and relationship extraction. However, the generated descriptions often lack detail and fail to capture the nuances of the image content. Which of the following strategies would MOST effectively address this limitation?

（A）Combine B and C.

（B）Increase the size of the training dataset for the object detection model.

（C）Incorporate visual attention mechanisms that allow the description generation model to focus on the most salient regions of the image.

（D）Use a more powerful transformer-based model (e.g., GPT-3) to generate the image descriptions from the extracted object, attribute, and relationship information.

（E）Manually rewrite a subset of descriptions to be more in line with the requirements.

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 13

You're training a multimodal Generative A1 model that takes video and text as input to predict future frames of the video. You notice that the model generates plausible visual content but often fails to accurately reflect the actions described in the text. Which of the following techniques is MOST likely to improve the alignment between the generated video and the text description?

（A）Decrease the resolution of the video frames.

（B）Using only pretrained model weights.

（C）Implement a contrastive learning objective that encourages similar embeddings for corresponding video frames and text descriptions.

（D）Use a larger vocabulary for the text encoder.

（E）Increase the frame rate of the training videos.

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

ページ: 1 / 31
トータル 403 問

NCA-GENM の機能をすべて解除する

キャプチャ不要
365日無料更新サービス
希望する合格率を設定できる
時間の割り当てられる（時間：分）
NCA-GENM に2つの練習用モード
サポートサービス対応

完全版を入手する

弊社を連絡する

我々は１２時間以内ですべてのお問い合わせを答えます。

オンラインサポート時間：( UTC+9 ) 9:00-24:00
月曜日から土曜日まで

サポート：現在連絡

トップ試験

1Z0-1151-25 試験問題集
CTAL_TM_001 試験問題集
LEED-AP-BD-C 試験問題集
2016-FRR 試験問題集
Databricks-Certified-Professional-Data-Engineer 試験問題集
C_C4H32_2411 試験問題集

NCA-GENM 無料問題集「NVIDIA Generative AI Multimodal」

弊社を連絡する

関連リンク

トップ試験