My Multithreaded Merge Sort Algorithm takes 10+ minutes to s

Whether you're a newbie or an experienced programmer, any questions, help, or just talk of any language will be welcomed here.

Moderator: Coders of Rage

Post Reply
User avatar
davidthefat
Chaos Rift Maniac
Chaos Rift Maniac
Posts: 529
Joined: Mon Nov 10, 2008 3:51 pm
Current Project: Fully Autonomous Robot
Favorite Gaming Platforms: PS3
Programming Language of Choice: C++
Location: California
Contact:

My Multithreaded Merge Sort Algorithm takes 10+ minutes to s

Post by davidthefat »

So this is for my AP Computer Science class in high school (equivalent to 2nd semester of Comp Sci 101) that focuses on data structures. So we have this little contest on who can make the fastest sorting algorithm. This current algorithm only takes 450 ms for sorting 20,000 strings. I do not know how that is compared to others, but seems like I'm on the right track. It handles 100,000 strings fine; less than 7 seconds (all these numbers include reading and writing the files) However, it just is a slug when it comes to 1 million strings. I tried to minimize the copying of data and just kept it to accessing using pointers (From what I've been told, all objects in java are pointers) Is this a nature of merge sort in itself or is it a flaw in my methodology?
Keep in mind, my system is a quad core cpu so I purposely made it only have 4 active running threads per file being sorted at any given time, but I've heard that the rule of thumb is 1.5 * the # of cores. The overhead from initiating the threads are negligible IMHO. It takes about 1/4 the time a single threaded merge sort does on my PC. And I am very aware that I should be commenting my file... Just a habit of never commenting for years.
I purposely did not join the sort threads until I started all of them by design. I know that it would be sluggish around 5+ files, but I will be testing one file at a time.
My sort method: [1] http://pastebin.com/YwHU4BPz
My "main" file: [2] http://pastebin.com/9kj4fpXL
User avatar
short
ES Beta Backer
ES Beta Backer
Posts: 548
Joined: Thu Apr 30, 2009 2:22 am
Current Project: c++, c
Favorite Gaming Platforms: SNES, PS2, SNES, SNES, PC NES
Programming Language of Choice: c, c++
Location: Oregon, US

Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes

Post by short »

ok your going for speed ..... are you forced to write it in java...?

(you know somebody was going to ask, might as well be me)
My github repository contains the project I am currently working on,
link: https://github.com/bjadamson
Rebornxeno
Chaos Rift Cool Newbie
Chaos Rift Cool Newbie
Posts: 85
Joined: Thu Jun 23, 2011 11:12 am

Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes

Post by Rebornxeno »

A little overhead really adds up if your making threads for everything you sort :(
User avatar
davidthefat
Chaos Rift Maniac
Chaos Rift Maniac
Posts: 529
Joined: Mon Nov 10, 2008 3:51 pm
Current Project: Fully Autonomous Robot
Favorite Gaming Platforms: PS3
Programming Language of Choice: C++
Location: California
Contact:

Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes

Post by davidthefat »

short wrote:ok your going for speed ..... are you forced to write it in java...?

(you know somebody was going to ask, might as well be me)
Yes, I am forced to write in java because the class is taught in java.
User avatar
LeonBlade
Chaos Rift Demigod
Chaos Rift Demigod
Posts: 1314
Joined: Thu Jan 22, 2009 12:22 am
Current Project: Trying to make my first engine in C++ using OGL
Favorite Gaming Platforms: PS3
Programming Language of Choice: C++
Location: Blossvale, NY

Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes

Post by LeonBlade »

short wrote:ok your going for speed ..... are you forced to write it in java...?

(you know somebody was going to ask, might as well be me)
:lol:
There's no place like ~/
User avatar
GroundUpEngine
Chaos Rift Devotee
Chaos Rift Devotee
Posts: 835
Joined: Sun Nov 08, 2009 2:01 pm
Current Project: mixture
Favorite Gaming Platforms: PC
Programming Language of Choice: C++
Location: UK

Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes

Post by GroundUpEngine »

davidthefat wrote:
short wrote:ok your going for speed ..... are you forced to write it in java...?

(you know somebody was going to ask, might as well be me)
Yes, I am forced to write in java because the class is taught in java.
The joys of life lol ;)
User avatar
Ginto8
ES Beta Backer
ES Beta Backer
Posts: 1064
Joined: Tue Jan 06, 2009 4:12 pm
Programming Language of Choice: C/C++, Java

Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes

Post by Ginto8 »

Since the performance is fine up until 1 million strings, I'm guessing you're running into memory problems, and the computer is starting to use (uber-slow) swap space. That, or the function stack is getting a little too deep and is slowing things down. I think it's a space complexity issue though.

Edit: taking a closer look at your implementation, I notice 2 things: first, that your method, run on a large data set, would create a HUGE amount of threads. And second, that these threads are... well, useless. If all you're doing is starting the threads then waiting for them to end, why not just have them be non-threaded methods? I think if you removed the threading you could actually make the algorithm significantly faster.
Quit procrastinating and make something awesome.
Ducky wrote:Give a man some wood, he'll be warm for the night. Put him on fire and he'll be warm for the rest of his life.
User avatar
Light-Dark
Dreamcast Developer
Dreamcast Developer
Posts: 307
Joined: Sun Mar 13, 2011 7:57 pm
Current Project: 2D RPG & NES Platformer
Favorite Gaming Platforms: NES,SNES,N64,Genesis,Dreamcast,PC,Xbox360
Programming Language of Choice: C/++
Location: Canada

Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes

Post by Light-Dark »

uh-oh Java, well Java doesn't do well performance wise i at least have hope that you are also learning C/++ so you may be forgiven for your transgressions?
<tpw_rules> LightDark: java is a consequence of inverse moore's law: every 18 months, the average program will be twice as slow. therefore, computers always run at the same percevied speed. java's invention was a monumental step
Image
User avatar
davidthefat
Chaos Rift Maniac
Chaos Rift Maniac
Posts: 529
Joined: Mon Nov 10, 2008 3:51 pm
Current Project: Fully Autonomous Robot
Favorite Gaming Platforms: PS3
Programming Language of Choice: C++
Location: California
Contact:

Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes

Post by davidthefat »

Ginto8 wrote:Since the performance is fine up until 1 million strings, I'm guessing you're running into memory problems, and the computer is starting to use (uber-slow) swap space. That, or the function stack is getting a little too deep and is slowing things down. I think it's a space complexity issue though.

Edit: taking a closer look at your implementation, I notice 2 things: first, that your method, run on a large data set, would create a HUGE amount of threads. And second, that these threads are... well, useless. If all you're doing is starting the threads then waiting for them to end, why not just have them be non-threaded methods? I think if you removed the threading you could actually make the algorithm significantly faster.
If you have not noticed, it actually does not spawn more than 6 threads other than the main and read file threads (and other overhead threads java has, if it has any)

Code: Select all

@Override
    public void run()
    {
        mergesort(0, num - 1, 2); //This makes it only spawn 6 threads; 2 threads that calls two each.
    }

Code: Select all

public static void mergesort(int l, int h, int i)
    {
        if (l < h)
        {
            int m = (l + h) / 2;
            if(i > 0) //Checks if that "i" in the argument is bigger than 0. It recursively decrements it two lines below. The else statement does not generate any threads.
            {
                merger t1 = new merger(l, m, i - 1);
                merger t2 = new merger(m + 1, h, i - 1);
                t1.start();
                t2.start();
                try
                {
                    t1.join();
                    t2.join();
                }
                catch(InterruptedException e)
                {
                    System.out.println(e.getMessage());
                }
            }
            else
            {
                mergesort(l, m, 0);
                mergesort(m + 1, h, 0);
            }
Light-Dark wrote:uh-oh Java, well Java doesn't do well performance wise i at least have hope that you are also learning C/++ so you may be forgiven for your transgressions?
Do not worry; I know C++. In fact, the teacher almost did not let me take the class because I used C++ all year in his class two years ago... Just a fun fact. Why I am taking the class again is because he is teaching data structures and algorithms to people who already too the class.



edit: I'm a freaking idiot... I knew constructing the array of strings every iteration will **** me up... So I fixed it; it runs like a charm. 1 million strings now take ~790 ms... Fixed Code: http://pastebin.com/Jqf4TVEk
Post Reply