Cheap and Secure Web Hosting Provider : See Now

[Solved]: How to understand the storing mechanism used in external merge sort

, , No Comments
Problem Detail: 

I was reading about external merge sort from the wikipedia article link, according to it:

External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive). External sorting typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted subfiles are combined into a single larger file.

Suppose we have a RAM which can only hold 2 chunks of data and we have 6 chunks of data to sort. Please see the below diagram:


Since our memory can hold 2 chunks of data so the first step 1 sounds plausible since we are sorting only pairs of numbers (5,6), (3,4) (1,2). In the step 2 we merge the data and now our chunk size is 4. My question is how do you now load this 4 chunk of data into memory now? Since your memory cannot accept more than 2 chunks of data then how do you load and sort them? How did you sort while merging chunks of data here? I have visited several links, but not able to understand this concept.

You must be performing some kind of sorting while merging the data as well right?

Asked By : python

Answered By : D.W.

How do you load that entire list into memory? You don't. You have a bit of a misunderstanding about how Merge works, in this setting.

It's tempting to think that Merge should work as follows: load both input lists into memory, in their entirety; merge them; then write the result back out to storage. But that wouldn't work -- we won't have enough memory to load the entire lists into memory.

Instead, Merge works without loading either input list into memory all at once. Take a look at the pseudocode for Merge, and you'll see that it only needs to have two list items in memory at a time: one list node from somewhere in the first list, and one from somewhere in the second node. Merge finds which one of those two items is smaller, outputs it, and then loads the next item from the input list it came from. You can see that Merge only needs $O(1)$ memory to merge two sorted lists, even if the sorted lists are stored in external storage -- it never needs to load the input lists into memory in their entirety.

Best Answer from StackOverflow

Question Source :

 Ask a Question

 Download Related Notes/Documents


Post a Comment

Let us know your responses and feedback