Majority Element

Imports

With the exception of the defaultdict, everything imported is for testing.

# python standard library
from collections import defaultdict
from datetime import (
    datetime,
    timedelta,
    )
import random

# pypi
from expects import (
    equal,
    expect,
)
import numpy

The Majority Element Problem

Check whether a sequence contains an element that appears more than half of the time. Note that the problem doesn't ask if one of the elements appears more than all the other elements, but rather whether it appears more than half the time (it isn't a simple majority).

Input A sequence of n integers
Output 1 if there is a majority element, 0 otherwise.
Constraints \(1 \le n \le 10^5; 0 \le a_i \le 10^9\) for all \(1 \le i \le n\)

Constants

class Constraints:
    min_length = 1
    max_length = 10**5
    min_value = 1
    max_value = 10**9
    max_time = timedelta(seconds=5)
class Outcome:
    has_majority = 1
    no_majority = 0

Samples

These are values to test against. The initial cases are for the naive-implementation and the last two are for an implementation that is more efficient.

class TestKeys:
    votes = 'input'
    expected = 'output'
SAMPLES = dict(
    one={
	TestKeys.votes: [2, 3, 9, 2, 2],
	TestKeys.expected: 1,
    },
    two={
	TestKeys.votes: [1, 2, 3, 1],
	TestKeys.expected: 0,
    },
    three={
	TestKeys.votes: [random.randint(1, Constraints.max_value),],
	TestKeys.expected: 1,
	       }
)

vote = random.randint(1, Constraints.max_value)
SAMPLES["four"] = {
    TestKeys.votes: [vote, vote],
    TestKeys.expected: 1,
}
SAMPLES["five"] = {
    TestKeys.votes: [vote, vote + 1],
    TestKeys.expected: 0,
}

Now we're going to add two cases that have the maximum allowed number of values to make sure our solution can finish in a reasonable time.

half = Constraints.max_length//2

left_size = half + 10
right_size = half - 10

left = numpy.ones(left_size) * random.randrange(Constraints.max_value)
right = numpy.random.randint(Constraints.min_value,
		      Constraints.max_value + 1,
		      right_size)
three = numpy.concatenate((left, right))
SAMPLES["six"] = {
    TestKeys.votes: three,
    TestKeys.expected: 1,
}

This next case isn't necessarily guaranteed to be true (numpy might generate an array with one element that is in the majority), but I think that the chance of it failing is pretty close to zero.

SAMPLES["seven"] = {
    TestKeys.votes: numpy.random.randint(Constraints.min_value,
				  Constraints.max_value + 1,
				  Constraints.max_length),
    TestKeys.expected: 0,

}

The Naive Solution

This is a translation of the pseudocode given with the problem. It has a runtime of \(O(n^2)\)

def naive_majority(voters):
    """Decides if there is a majority element

    Args:
     voters: list of elements to check

    Returns:
     int: 1 if there is a majority element, 0 otherwise
    """
    half = len(voters)//2
    for index, voter in enumerate(voters):
	count = 0
	for other_voter in voters:
	    if voter == other_voter:
		count += 1
	if count > half:
	    return Outcome.has_majority
    return Outcome.no_majority

Now we can test if it is correct.

for sample in "one two three four five".split():
    values = SAMPLES[sample]
    actual = naive_majority(values[TestKeys.votes])
    expected = values[TestKeys.expected]
    print("{} actual: {} expected: {}".format(sample,
					      actual,
					      expected))

    expect(actual).to(equal(expected))

Although it looks correct the grader says it times out.

Failed case #11/21: time limit exceeded (Time used: 9.98/5.00, memory used: 21266432/536870912.)

Iterative Version

Although this is the divide-and conquer section, the more intuitive way for me is to just count and sort the items to see if the item with the most votes is the majority. The counting of the votes is \(O(n)\) and the sort adds \(O(n \log n)\).

def iterative_majority(votes):
    """Decides if there is a majority among the votes

    Args:
     votes (list): collection to check

    Returns:
     int: 1 if there is a majority, 0 otherwise
    """
    half = len(votes)//2
    counts = defaultdict(lambda: 0)
    for vote in votes:
	counts[vote] += 1

    sorted_counts = sorted((count for count in counts.values()), reverse=True)
    return (Outcome.has_majority if sorted_counts[0] > half
	    else Outcome.no_majority)
def test_implementation(implementation):
    """runs the implementation against the samples

    Args:
     implementation: callable to test

    Raises:
     AssertionError: answer wasn't the expected
    """
    for sample, values in SAMPLES.items():
	start = datetime.now()
	actual = implementation(values[TestKeys.votes])
	expected = values[TestKeys.expected]
	elapsed = datetime.now() - start
	print("({}) elapsed: {} actual: {} expected: {}".format(
	    sample,
	    elapsed,
	    actual,
	    expected))
	expect(actual).to(equal(expected))
	assert elapsed < Constraints.max_time
test_implementation(iterative_majority)

This version passes the grader.

Good job! (Max time used: 0.14/5.00, max memory used: 22638592/536870912.)

Iterative Two

We can get rid of the sort since only have to check the count. This reduces the runtime to \(O(n)\), although since the for-loop is now pure python it might not actually speed things up much.

def iterative_majority_two(votes):
    """Decides if there is a majority among the votes

    Args:
     votes (list): collection to check

    Returns:
     int: 1 if there is a majority, 0 otherwise
    """
    half = len(votes)//2
    counts = defaultdict(lambda: 0)
    for vote in votes:
	counts[vote] += 1

    for count in counts.values():
	if count > half:
	    return Outcome.has_majority
    return Outcome.no_majority
test_implementation(iterative_majority_two)

This one also passes the grader.

Good job! (Max time used: 0.14/5.00, max memory used: 22626304/536870912.)

It took exactly the same amount of time in the grader (although that might be because the time difference is less than their rounding), but used up a little less memory.