612 CHAPTER 27: SUBSETS
= (SELECT MIN(keycol)
+ (MAX (keycol) - MIN (keycol) * RANDOM())) FROM LotteryDrawing AS L2);
Here is a version which uses the COUNT(*) functions and a self-join instead.
SELECT L1.*
FROM LotteryDrawing AS L1
WHERE CEILING ((SELECT COUNT(*) FROM LotteryDrawing) * RANDOM())
= (SELECT COUNT(*)
FROM LotteryDrawing AS L2 WHERE L1.keycol <= L2.keycol);
The rounding away from zero is important, since we are in effect numbering the rows from one. The idea is to use the decimal fraction to hit the row that is far into the table when the rows are ordered by the key.
Having shown you this code, I have to warn you that the pure SQL has a good number of self-joins, and they will be expensive to run.
27.3 The CONTAINS Operators
Set theory has two symbols for subsets. One, , means that set A is contained within set B; this is sometimes said to denote a proper subset. The other, means “is contained in or equal to,” and is sometimes called just a subset or containment operator.
Standard SQL has never had an operator to compare tables against each other for equality or containment. Several college textbooks on relational databases mention a CONTAINS predicate, which does not exist in Standard SQL. This predicate existed in the original System R, IBM’s first experimental SQL system, but it was dropped from later SQL implementations because of the expense of running it.
27.3.1 Proper Subset Operators
The IN predicate is a test for membership. For those of you who remember your high school set theory, membership is shown with a stylized epsilon with the containing set on the right side: a ∈ A. Membership is for one element, whereas a subset is itself a set, not just an element. As an example of a subset predicate, consider a query to tell
27.3 The CONTAINS Operators 613
you the names of each employee who works on all of the projects in department 5. Using the System R syntax:
SELECT name -- Not valid SQL! FROM Personnel
WHERE (SELECT project_nbr FROM WorksOn
WHERE Personnel.emp_nbr = WorksOn.emp_nbr) CONTAINS
(SELECT project_nbr FROM Projects
WHERE dept_nbr = 5);
In the second SELECT statement of the CONTAINS predicate, we build a table of all the projects in department 5. In the first SELECT statement of the CONTAINS predicate, we have a correlated subquery that will build a table of all the projects each employee works on. If the table of the employee’s projects is equal to or a superset of the department 5 table, the predicate is TRUE.
You must first decide what you are going to do about duplicate rows in either or both tables. That is, does the set { a, b, c } contain the multiset { a, b, b } or not? Some SQL set operations, such as SELECT and UNION, have options to remove or keep duplicates from the results (e.g., UNION ALL and SELECT DISTINCT).
I would argue that duplicates should be ignored, and that the multiset is a subset of the other. For our example, let us use a table of employees and another table with the names of the company bowling team members, which should be a proper subset of the Personnel table. For the bowling team to be contained in the set of employees, each bowler must be an employee; or, to put it another way, there must be no bowler who is not an employee.
NOT EXISTS (SELECT * FROM Bowling AS B1
WHERE B1.emp_nbr NOT IN (SELECT emp_nbr FROM Personnel))
27.3.2 Table Equality
How can I find out if two tables are equal to each other? This is a common programming problem, and the specification sounds obvious.
When two sets, A and B, are equal, then we know that:
614 CHAPTER 27: SUBSETS
1. Both have the same number of elements 2. No elements in A are not in B
3. No elements in B are not in A
4. Set A is equal to the intersection of A and B 5. Set B is equal to the intersection of A and B 6. Set B is a subset of A
7. Set A is a subset of B
as well as probably a few other things vaguely remembered from an old math class. But equality is not as easy as it sounds in SQL, because the language is based on multisets or bags, which allow duplicate elements, and the language has NULLs. Given this list of multisets, which pairs are equal to each other?
S0 = {a, b, c} S1 = {a, b, NULL}
S2 = {a, b, b, c, c} S3 = {a, b, NULL} S4 = {a, b, c}
S5 = {x, y, z}
Everyone will agree that S0 = S4, because they are identical. Everyone will agree that S5 is not equal to any other set because it has
no elements in common with any of them. How do you handle redundant duplicates? If you ignore them, then S0 = S2. Should NULLs be given the benefit of the doubt and matched to any known value or not? If so, then S0 = S1 and S0 = S3. But then do you want to say that S1 = S3 because we can pair up the NULLs with each other?
To make matters even worse: are two rows equal if they match on just their keys, on a particular subset of their columns, or on all their columns? The reason this question comes up in practice is that you often have to match up data from two sources that have slightly different versions of the same information (i.e., “Joe F. Celko” and “Joseph Frank Celko” are probably the same person).
The good part about matching things on the keys is that you do have a true set—keys are unique and cannot have NULLs. If you go back to the list of set equality tests that I gave at the start of this article, you can see some possible ways to code a solution.
27.3 The CONTAINS Operators 615
If you use facts 2 and 3 in the list, then you might use NOT EXISTS() predicates.
...
WHERE NOT EXISTS (SELECT * FROM A
WHERE A.keycol
NOT IN (SELECT keycol FROM B
WHERE A.keycol = B.keycol)) AND NOT EXISTS (SELECT *
FROM B
WHERE B.keycol
NOT IN (SELECT keycol FROM A
WHERE A.keycol = B.keycol))
This query can also be written as:
WHERE NOT EXISTS (SELECT *
FROM A
EXCEPT [ALL] SELECT *
FROM B
WHERE A.keycol = B.keycol) UNION
SELECT * FROM B
EXCEPT [ALL] SELECT *
FROM A
WHERE A.keycol = B.keycol))
The use of the optional EXCEPT ALL operators will determine how duplicates are handled.
However, if you look at 1, 4, and 5, you might come up with this answer:
...
WHERE (SELECT COUNT(*)FROM A)
616 CHAPTER 27: SUBSETS
= (SELECT COUNT(*) FROM A INNER JOIN B
ON A.keycol = B.keycol) AND (SELECT COUNT(*)FROM B)
= (SELECT COUNT(*)
FROM A INNER JOIN B
ON A.keycol = B.keycol)
This query will produce a list of the unmatched values; you might want to keep them in two columns instead of coalescing them as I have shown here.
SELECT DISTINCT COALESCE(A.keycol, B.keycol) AS non_matched_key FROM A
FULL OUTER JOIN B
ON A.keycol = B.keycol WHERE A.keycol IS NULL
OR B.keycol IS NULL;
Eventually, you will be able to handle this with the INTERSECT [ALL] and UNION [ALL] operators in Standard SQL and tune the query to whatever definition of equality you wish to use.
Unfortunately, these examples are for just comparing the keys. What do we do if we have tables without keys, or if we want to compare all the columns?
GROUP BY, DISTINCT, and a few other things in SQL treat NULLs as if they were equal to each other. This is probably the definition of equality we would like to use.
Remember that if one table has more columns or more rows than the other, we can stop right there, since they cannot possibly be equal under that definition. We have to assume that the tables have the same number of columns, of the same type, and in the same positions. But row counts look useful. Imagine that there are two children, each with a bag of candy. To determine that both bags are identical, the first children can start by pulling a piece of candy out and asking the other, “How many red ones do you have?” If the two counts disagree, we know that the bags are different. Now ask about the green pieces. We do not have to match each particular piece of candy in one bag with a particular piece of candy in the other bag. The counts are enough information, only if they differ. If the counts are the same, more work needs to be done. We could each
...
- tailieumienphi.vn

nguon tai.lieu . vn