X-Received: by 10.140.20.105 with SMTP id 96mr7675177qgi.7.1457418543163; Mon, 07 Mar 2016 22:29:03 -0800 (PST) X-Received: by 10.182.246.231 with SMTP id xz7mr266904obc.5.1457418543117; Mon, 07 Mar 2016 22:29:03 -0800 (PST) Path: csiph.com!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!peer03.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!w104no5528804qge.1!news-out.google.com!k1ni12240igd.0!nntp.google.com!hb3no12420522igb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.java.help Date: Mon, 7 Mar 2016 22:29:02 -0800 (PST) Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=45.112.255.53; posting-account=Pjd0-woAAACnwbgBsiT0MBjyNX9ncKqS NNTP-Posting-Host: 45.112.255.53 User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <1353c737-824c-4e42-a995-7f366fe5a009@googlegroups.com> Subject: Top 20 Hadoop and Map Reduce Interview Questions From: sowmya trainer Injection-Date: Tue, 08 Mar 2016 06:29:03 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Received-Bytes: 7441 X-Received-Body-CRC: 3255238985 Xref: csiph.com comp.lang.java.help:3765 1) What is Hadoop Map Reduce ? For processing large data sets in parallel across a hadoop cluster, Hadoop = MapReduce framework is used. Data analysis uses a two-step map and reduce = process. 2) How HADOOP MapReduce works? In MapReduce, during the map phase it counts the words in each document, wh= ile in the reduce phase it aggregates the data as per the document spanning= the entire collection. During the map phase the input data is divided into= splits for analysis by map tasks running in parallel across Hadoop framewo= rk. 3) Explain what is shuffling in MapReduce ? The process by which the system performs the sort and transfers the map out= puts to the reducer as inputs is known as the shuffle 4) Explain what is distributed Cache in MapReduce Framework ? Distributed Cache is an important feature provided by map reduce framework.= When you want to share some files across all nodes in Hadoop Cluster, Dist= ributedCache is used. The files could be an executable jar files or simpl= e properties file. 5) Explain what is NameNode in Hadoop? NameNode in Hadoop is the node, where Hadoop stores all the file location i= nformation in HDFS (Hadoop Distributed File System). In other words, NameN= ode is the centrepiece of an HDFS file system. It keeps the record of all = the files in the file system, and tracks the file data across the cluster o= r multiple machines 6) Explain what is JobTracker in Hadoop? What are the actions followed by H= adoop? In Hadoop for submitting and tracking MapReduce jobs, JobTracker is used. = Job tracker run on its own JVM process Hadoop performs following actions in Hadoop * Client application submit jobs to the job tracker * JobTracker communicates to the Namemode to determine data location * Near the data or with available slots JobTracker locates TaskTracker node= s * On chosen TaskTracker Nodes, it submits the work * When a task fails, Job tracker notify and decides what to do then. * The TaskTracker nodes are monitored by JobTracker 7) Explain what is heartbeat in HDFS? Heartbeat is referred to a signal used between a data node and Name node, a= nd between task tracker and job tracker, if the Name node or job tracker do= es not respond to the signal, then it is considered there is some issues wi= th data node or task tracker 8) Explain what combiners is and when you should use a combiner in a MapRed= uce Job? To increase the efficiency of MapReduce Program, Combiners are used. The a= mount of data can be reduced with the help of combiner's that need to be tr= ansferred across to the reducers. If the operation performed is commutative= and associative you can use your reducer code as a combiner. The executio= n of combiner is not guaranteed in Hadoop 9) What happens when a datanode fails ? When a datanode fails * Jobtracker and namenode detect the failure * On the failed node all tasks are re-scheduled * Namenode replicates the users data to another node 10) Explain what is Speculative Execution? In Hadoop during Speculative Execution a certain number of duplicate tasks = are launched. On different slave node, multiple copies of same map or redu= ce task can be executed using Speculative Execution. In simple words, if a = particular drive is taking long time to complete a task, Hadoop will create= a duplicate task on another disk. Disk that finish the task first are ret= ained and disks that do not finish first are killed. 11) Explain what are the basic parameters of a Mapper? The basic parameters of a Mapper are * LongWritable and Text * Text and IntWritable 12) Explain what is the function of MapReducer partitioner? The function of MapReducer partitioner is to make sure that all the value o= f a single key goes to the same reducer, eventually which helps evenly dist= ribution of the map output over the reducers 13) Explain what is difference between an Input Split and HDFS Block? Logical division of data is known as Split while physical division of data = is known as HDFS Block 14) Explain what happens in textinformat ? In textinputformat, each line in the text file is a record. Value is the c= ontent of the line while Key is the byte offset of the line. For instance, = Key: longWritable, Value: text 15) Mention what are the main configuration parameters that user need to sp= ecify to run Mapreduce Job ? The user of Mapreduce framework needs to specify * Job's input locations in the distributed file system * Job's output location in the distributed file system * Input format * Output format * Class containing the map function * Class containing the reduce function * JAR file containing the mapper, reducer and driver classes =09 16) Explain what is WebDAV in Hadoop ? To support editing and updating files WebDAV is a set of extensions to HTTP= . On most operating system WebDAV shares can be mounted as filesystems , s= o it is possible to access HDFS as a standard filesystem by exposing HDFS o= ver WebDAV. 17) Explain what is sqoop in Hadoop ? To transfer the data between Relational database management (RDBMS) and Had= oop HDFS a tool is used known as Sqoop. Using Sqoop data can be transferred= from RDMS like MySQL or Oracle into HDFS as well as exporting data from HD= FS file to RDBMS 18) Explain how JobTracker schedules a task ? The task tracker send out heartbeat messages to Jobtracker usually every fe= w minutes to make sure that JobTracker is active and functioning. The mess= age also informs JobTracker about the number of available slots, so the Job= Tracker can stay upto date with where in the cluster work can be delegated 19) Explain what is Sequencefileinputformat? Sequencefileinputformat is used for reading files in sequence. It is a spec= ific compressed binary file format which is optimized for passing data betw= een the output of one MapReduce job to the input of some other MapReduce jo= b. 20) Explain what does the conf.setMapper Class do ? Conf.setMapperclass sets the mapper class and all the stuff related to map= job such as reading data and generating a key-value pair out of the mapper For more details please go through the website. =20 =09 http://techdatasolution.in/hadoop-trainingmumbai.html Contact Me : Info@techdatasolution.in