This post will define multisets and basic operations on multisets.
We’ll view union, intersection, inclusion and sum each from four perspectives:
-
- Examples with words
- Example with prime factorization
- Using Python’s multiset module
- Multisets as functions
Definition and examples
A multiset is like a set, except each element may appear more than once. We say each element has a multiplicity.
Example with words
For example, the set of letters in the word mississippi is {i, m, p, s} but the multiset of letters would be
{i, i, i, i, m, p, p, s, s, s, s}
As with sets, order doesn’t matter. We could write the multiset of letters in mississippi more compactly as
{i4, m1, p2, s4}
This looks a lot like a prime factorization, and indeed prime factorization will be our running example in this post.
Example with prime factors
The factors of 2022 form an ordinary set because
2022 = 2 × 3 × 337
but the factors of 20221026 form a multiset because
20221026 = 2 × 3 × 7² × 109 × 631,
i.e. the factor 7 has multiplicity 2 because 20221026 is divisible by 49.
Python example
The factorint
function in sympy
returns a multiset, represented as a dictionary. The keys are the prime factors and the values are the multiplicities.
>>> from sympy import factorint >>> factorint(2022) {2: 1, 3: 1, 337: 1} >>> factorint(20221026) {2: 1, 3: 1, 7: 2, 109: 1, 631: 1}
Suppose the members of our multisets come from a universal set A. A multiset can be represented as a function f from A to the set of non-negative integers, mapping each element to its multiplicit. In the case of prime factorizations, A is the set of prime numbers and f(p) is the exponent of p in the prime factorization of some number.
We will use the Python library multiset
to represent multisets. We could initialize a Multiset
object with the return value of a call to factorint
.
>>> from multiset import Multiset >>> M = Multiset(factorint(20221026)) >>> M Multiset({2: 1, 3: 1, 7: 2, 109: 1, 631: 1})
Union
The elements in the union of two multisets are the union of the elements of in each of the multisets, with multiplicity equal to the maximum from each multiset. An example will make this clearer.
Example with words
The union of
{m, i, s, s, i, s, s, i, p, p, i}
and
{m, a, s, s, a, c, h, u, s, e, t, t, s}
would be
{a, a, c, e, h, i, i, i, i, m, p, p, s, s, s, s, t, t, u}
Multiset unions don’t have to be ordered, but ordering them makes it easier to compute the union.
Example with Python
Here’s Python code to repeat the example above.
>>> MS = Multiset("mississippi") >>> MA = Multiset("massachusetts") >>> MS.union(MA) Multiset({'m': 1, 'i': 4, 's': 4, 'p': 2, 'a': 2, 'c': 1, 'h': 1, 'u': 1, 'e': 1, 't': 2})
Prime factor example
Let a and b be two non-negative integers. The union of a‘s prime factorization with b‘s prime factorization is the prime factorization of the least common multiple of a and b.
For example, let a = 12 and b = 10.
lcm(2² × 3, 2 × 5) = 2² × 3 × 5.
Function perspective
If the function f represents the multiset A and g represents the multiset B then the function max(f, g) represents the union of A and B.
Intersection
To form the intersection of two multisets, intersect the underlying sets, and set the multiplicities to the minimum of the multiplicities in each of the original multisets.
Example with words
The intersection of
{m, i, s, s, i, s, s, i, p, p, i}
and
{m, a, s, s, a, c, h, u, s, e, t, t, s}
would be
{m, s, s, s, s}
Example with Python
Reusing MA
and MS above, we have
>>> MS.intersection(MA) Multiset({'m': 1, 's': 4})
Prime factorization
The intersection of the prime factorization of a and the prime factorization of b is the prime factorization of the greatest common factor of a and b.
For example,
gcd(2³ × 3 × 5³, 2 × 5² × 17) = 2 × 5²
Function perspective
If the function f represents the multiset A and g represents the multiset B then the function min(f, g) represents the intersection of A and B.
Inclusion
A multiset A is included in a multiset B if every element of A is an element of B with the same or greater multiplicity.
Example with words
The multiset of letters in the word kale is included in the multiset of letters in kalman filter.
The multiset of letters in apple is not included in the multiset of letters in pale rider. The set
{a, p, l, e}
is contained in the set
{p, a, l, e, r, i, d}
but the multiset
{a, p, p, l, e}
is not contained in the multiset
{p, a, l, e, e, r, r, i, d}
because ‘p’ has multiplicity 2 in the former but multiplicity 1 in the latter.
Python example
The multiset
module overloads <=
for inclusion.
>>> Multiset("kale") <= Multiset("kalman filter") True >>> Multiset("apple") <= Multiset("pale rider") False
Prime factorization
The multiset of prime factors of a is included in the multiset of prime factors of b if and only if a divides b.
Function perspective
If the function f represents the multiset A and g represents the multiset B then A is included in B if and only if f ≤ g.
Sum
The union of two multisets may not be exactly what you expect. For example, maybe you expected the union of
{a, b, b}
with
{b, c}
to be
{a, b, b, b, c}
rather than
{a, b, b, c}.
There is an operation on multisets like this, but it’s called sum rather than union. In the sum of two multisets, multiplicities add.
Example with words
The sum of the multiset of letters in harry and potter is the multiset of letters in harrypotter.
Example with Python
>>> Multiset("harry") + Multiset("potter") Multiset({'h': 1, 'a': 1, 'r': 3, 'y': 1, 'p': 1, 'o': 1, 't': 2, 'e': 1})
Note that ‘r’ has multiplicity 3 in the sum because it has multiplicity 2 in the first multiset and 1 in the second.
Prime factorization
The sum of the multiset of prime factors of a and the multiset of prime factors of b is the multiset of prime factors of ab.
Function perspective
If the function f represents the multiset A and g represents the multiset B then f + g represents the sum of A and B.
Sticking with the letters rather than the prime factors: the multiset corresponding to a word is a bag of scrabble tiles sufficient to spell that word. The union of two words’ multisets is a bag of scrabble tiles sufficient to spell *either* word, but probably not both. (The union of two general multisets is a bag that can spell any word in either multiset.)
Good analogy. With that analogy, the sum of two multisets is pouring the tiles from each bag into a single bag.