Monday, December 28, 2020

Hadoop Questions and Answers – Compression

 This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “Compression”.

1. The _________ codec from Google provides modest compression ratios.
a) Snapcheck
b) Snappy
c) FileCompress
d) None of the mentioned

Answer: b
Explanation: Snappy has fast compression and decompression speeds.

2. Point out the correct statement.
a) Snappy is licensed under the GNU Public License (GPL)
b) BgCIK needs to create an index when it compresses a file
c) The Snappy codec is integrated into Hadoop Common, a set of common utilities that supports other Hadoop subprojects
d) None of the mentioned

Answer: c
Explanation: You can use Snappy as an add-on for more recent versions of Hadoop that do not yet provide Snappy codec support.

3. Which of the following compression is similar to Snappy compression?
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned

Answer: a
Explanation: LZO is only really desirable if you need to compress text files.

4. Which of the following supports splittable compression?
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned

Answer: a
Explanation: LZO enables the parallel processing of compressed text file splits by your MapReduce jobs.

5. Point out the wrong statement.
a) From a usability standpoint, LZO and Gzip are similar
b) Bzip2 generates a better compression ratio than does Gzip, but it’s much slower
c) Gzip is a compression utility that was adopted by the GNU project
d) None of the mentioned

Answer: a
Explanation: From a usability standpoint, Bzip2 and Gzip are similar.

6. Which of the following is the slowest compression technique?
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned

Answer: b
Explanation: Of all the available compression codecs in Hadoop, Bzip2 is by far the slowest.

7. Gzip (short for GNU zip) generates compressed files that have a _________ extension.
a) .gzip
b) .gz
c) .gzp
d) .g

Answer: b
Explanation: You can use the gunzip command to decompress files that were created by a number of compression utilities, including Gzip.

8. Which of the following is based on the DEFLATE algorithm?
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned

Answer: c
Explanation: gzip is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman Coding.

9. __________ typically compresses files to within 10% to 15% of the best available techniques.
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned

Answer: b
Explanation: bzip2 is a freely available, patent free (see below), high-quality data compressor.

10. The LZO compression format is composed of approximately __________ blocks of compressed data.
a) 128k
b) 256k
c) 24k
d) 36k

Answer: b
Explanation: LZO was designed with speed in mind: it decompresses about twice as fast as gzip, meaning it’s fast enough to keep up with hard drive read speeds.

No comments:

Post a Comment

Get max value for identity column without a table scan

  You can use   IDENT_CURRENT   to look up the last identity value to be inserted, e.g. IDENT_CURRENT( 'MyTable' ) However, be caut...