You have installed a cluster running HDFS and MapReduce version 2 (MRv2) on YARN. You have no afs.hosts entry()ies in your hdfs-alte.xml configuration file. You configure a new worker node by setting fs.default.name in its configuration files to point to the NameNode on your cluster, and you start the DataNode daemon on that worker node.
What do you have to do on the cluster to allow the worker node to join, and start storing HDFS blocks?
A. Nothing; the worker node will automatically join the cluster when the DataNode daemon is started.
B. Without creating a dfs.hosts file or making any entries, run the command hadoop dfsadmin -
refreshHadoop on the NameNode
C. Create a dfs.hosts file on the NameNode, add the worker node's name to it, then issue the command
hadoop dfsadmin -refreshNodes on the NameNode D. Restart the NameNode
- You want to clean up this list by removing jobs where the state is KILLED. What command you enter? A. Yarn application -kill application_1374638600275_0109
B. Yarn rmadmin -refreshQueue
C. Yarn application -refreshJobHistory
D. Yarn rmadmin -kill application_1374638600275_0109
Reference: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-latest/bk_using-apache-hadoop/ content/common_mrv2_commands.html
Assuming a cluster running HDFS, MapReduce version 2 (MRv2) on YARN with all settings at their default, what do you need to do when adding a new slave node to a cluster?
A. Nothing, other than ensuring that DNS (or /etc/hosts files on all machines) contains am entry for the
B. Restart the NameNode and ResourceManager deamons and resubmit any running jobs C. Increase the value of dfs.number.of.needs in hdfs-site.xml
D. Add a new entry to /etc/nodes on the NameNode host.
E. Restart the NameNode daemon.
You have a 20 node Hadoop cluster, with 18 slave nodes and 2 master nodes running HDFS High Availability (HA). You want to minimize the chance of data loss in you cluster. What should you do?
A. Add another master node to increase the number of nodes running the JournalNode which increases
the number of machines available to HA to create a quorum
B. Configure the cluster's disk drives with an appropriate fault tolerant RAID level
C. Run the ResourceManager on a different master from the NameNode in the order to load share HDFS
D. Run a Secondary NameNode on a different master from the NameNode in order to load provide
automatic recovery from a NameNode failure
E. Set an HDFS replication factor that provides data redundancy, protecting against failure
You decide to create a cluster which runs HDFS in High Availability mode with automatic failover, using Quorum-based Storage. What is the purpose of ZooKeeper in such a configuration?
A. It manages the Edits file, which is a log changes to the HDFS filesystem. B. It monitors an NFS mount point and reports if the mount point disappears
C. It both keeps track of which NameNode is Active at any given time, and manages the Edits file, which is
a log of changes to the HDFS filesystem
D. It only keeps track of which NameNode is Active at any given time
E. Clients connect to ZoneKeeper to determine which NameNode is Active
Reference: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/PDF/CDH4- High-Availability-Guide.pdf (page 15)
During the execution of a MapReduce v2 (MRv2) job on YARN, where does the Mapper place the intermediate data each Map task?
A. The Mapper stores the intermediate data on the mode running the job's ApplicationMaster so that is
available to YARN's ShuffleService before the data is presented to the Reducer
B. The Mapper stores the intermediate data in HDFS on the node where the MAP tasks ran in the
usercache/&[user]sppcache/application_&(appid) directory for the user who ran the job
C. YARN holds the intermediate data in the NodeManager's memory (a container) until it is transferred to
D. The Mapper stores the intermediate data on the underlying filesystem of the local disk in the directories
E. The Mapper transfers the intermediate data immediately to the Reducers as it generated by the Map
Which Yarn daemon or service monitors a Container's per-application resource usage (e.g, memory, CPU)?
C. ApplicationManagerService D. ResourceManager
Reference: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-184.108.40.206/bk_using-apache-hadoop/ content/ch_using-apache-hadoop-4.html (4th para)
You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the network fabric. Which workloads benefit the most from a faster network fabric?
A. When your workload generates a large amount of output data, significantly larger than amount
B. When your workload generates a large amount of intermediate data, on the order of the input data
C. When workload consumers a large amount of input data, relative to the entire capacity of HDFS D. When your workload consists of processor-intensive tasks
For each YARN Job, the Hadoop framework generates task log files. Where are Hadoop's files stored?
A. In HDFS, In the directory of the user who generates the job
B. On the local disk of the slave node running the task
C. Cached In the YARN container running the task, then copied into HDFS on fob completion
D. Cached by the NodeManager managing the job containers, then written to a log directory on the
Answer B Explanation:
You are the hadoop fs -put command to add a file "sales.txt"? to HDFS. This file is small enough that it fits into a single block, which is replicated to three nodes in your cluster (with a replication factor of 3). One of the nodes holding this file (a single block) fails. How will the cluster handle the replication of this file in this situation/
A. The cluster will re-replicate the file the next time the system administrator reboots the NameNode
daemon (as long as the file's replication doesn't fall two)
B. This file will be immediately re-replicated and all other HDFS operations on the cluster will halt until the
cluster's replication values are restored
C. The file will remain under-replicated until the administrator brings that nodes back online
D. The file will be re-replicated automatically after the NameNode determines it is under replicated based
on the block reports it receives from the DataNodes
Would you like to see more? Don't miss our CCA-505 PDF file at: http://blog.giftovus.com/?tell=cloudera-pdf/cca-505-pdf.html