98-364 MTA Microsoft Database Fundamentals – Querying and Manipulating Data Part 2
Welcome back. In this video, we’re going to start with querying databases, which means we’re going to request information from the database using Transact-SQL. If we head over to the Object Explorer, expand the Databases node, select the North Wind database, and expand it, you’ll see all the objects that are in this database. For now, we’ll focus only on tables; as you can see, the alt table name starts with DBO, which stands for Database Owner. This is the default database schema. A schema is basically a named location within the database used for organizing database objects.
We’ll cover this again when creating database objects. Now let’s look at our environment. In SQL Server Management Studio, we have two toolbars: the Standard toolbar and the SQL Editor toolbar. To open a new query window, we can use the keyboard shortcut Control-N or click on the New Query button. On the SQL Editor toolbar, we have a drop-down list containing the names of the databases on this server. Using this drop-down list, we can select which database we want to work with in the process of switching. The targeted database is called “context switching,” which can also be done using Transact-SQL. I will show you how to do that later on in this course. On the right of the drop-down list, we have the Execute button. This button is used to execute transactional SQL statements.
We can also make use of the shortcut, which is five. If we right-click on the customer table and select the top 1000 rows, we get A new query window opens with a generated script. The script is automatically executed, and the retrieved data is displayed in the Results tab. For now, I’m going to remove the top 1000 from the query so that we can focus on the basic construct of a select statement. The select statement always starts with the select command followed by the columns we want to have in the result set. Then we have the from clause to specify from which database object we’re querying data. Here we have what is called a three-part name.
We have the database name followed by the schema name, and then the object name. In this case, the customers sit at the table. It is not necessary to use three-part names when writing queries that target a single database, but I do recommend using two-part names. These consist of the schema name and the object name. If we head back over to the Object Explorer and expand the Customers table, you’ll see a classification of objects that are within the Customers table. If we expand the Columns node, you’ll see all the columns that are part of the table, as well as some basic information about the columns, such as data type, null ability, whether the column is a primary key or a foreign key, and so on.
If we look at the generated select statement, you’ll see that our select query covers all the columns of the customer’s table. When we write a query to select all the columns from a table, we don’t have to explicitly specify all the columns. We can simply make use of the asterisk, which in database terms refers to all. If I hover over the asterisk, you’ll see that it includes all the columns of the customer’s table. Executing this query will give us the same result as the previous query. Another very important thing you may have noticed are the square brackets around the object names.
Square brackets are necessary if the names of the objects are reserved keywords or contain special characters such as a space or a hyphen. So in our case, we could just simply remove the square brackets, and our query would still execute. But that would not be the same if we queried the Order Details table.
A simple select statement on the Order details table without square brackets results in incorrect syntax, and that’s because the name of the table contains a space and also because order is a reserved keyword. Basically, all the words that have a blue font colour are reserved keywords. We fix this syntax error by putting the table name within square brackets.
The last thing I want to use when writing SQL statements is the semicolon. The semicolon character is a statement terminator, so if we have a number of sequel statements and we want to execute them together, we use the semicolon to separate them. The semicolon should always be added at the end of a sequel statement. While it is possible to execute multiple transaction sequel statements without using a semicolon in some cases, it is always a good practise to properly end your statements. We’ve now covered the basic structure of a select statement. In the next video, we’re going to look at aliasing. see you then.
Welcome back. In this video, we’re going to look at how we can apply aliases to database objects. So what is an alias? An alias is basically a temporary nickname we can assign to database objects, also referred to as the correlation name. Instead of using the full qualifying name of an object, we can simply use its alias to refer to it. We do that by using the as clause, which is most commonly used when joining tables. We haven’t covered the concept of joining tables yet, but when we write a query that spans multiple tables that have columns with the same name, you have to be specific. Imagine we’re writing a query to get data from the products and the suppliers table.
Both tables have a common column, which is the Supplier ID column. Writing a query based on these tables without being specific will cause an “ambiguous column name error,” which basically means SQL Server cannot determine which column we’re referring to. Are we referring to the Supplier ID column from the Products table? Or are we referring to the Supplier ID column from the Suppliers table? It’s ambiguous. So to fix this error, we need to be specific. And there are a number of ways we can do that. First, we can use a multipart identifier consisting of the name of the table and the name of the column. The problem with this method is that, in order to prevent ambiguity and be consistent, you’d write products in front of every column of the Products table.
That’s a lot of letters you’re going to have to repeat. And it’s never an ideal situation to have a lot of code for a query. So another thing we could do is give the “Products table a rather short nickname. Here’s where aliasing comes in. Instead of having to write “products” every single time, we could just simply write the letter P instead. The letter “P” then becomes the nickname for the Products table. As we use the nickname, we can no longer use the name of the table in the multipart identifier, so we’re going to replace its occurrences with the letter P.
You can perform the replacing action by pressing CTRL H and AltA, or you can do it manually. This is one of the many scenarios where aliases are pretty helpful. For now. It may be a little hazy, but we’ll COVID this more thoroughly when working with joins. Another use case for aliases is, for example, to improve the readability of column names or to protect the names of the columns. When we design database objects, we try not to use spaces, and that means that certain objects will have names consisting of multiple words, sometimes even abbreviations. If someone else is looking at the queried data, they may not understand those names or abbreviations. And that’s also where aliasing comes in.
If we look at our table, most of the column names are readable, but just for explanation purposes, we’re going to assume that the quantity per unit column is not readable. Changing the name of the column means altering the structure of the table, and that is not an option. We’re going to give the column a nickname using the as clause. This time, we’ll write quantity per unit, complete with spaces. Here we can use either square brackets or single quotes to write the nickname.
Whatever you prefer. If we execute the query, we can now see the nickname we specified as the column name. And this is how we apply nicknames to existing columns. Now, let’s take a look at the next query. Here we’re querying the order details table. There’s not really a lot of data here, just numerical values. As you can see, we have the unit price, the quantity, and the discount columns. These are all the details we need for calculating the net amount, which is basically the price the products have been sold for. We’re now going to create an expression to calculate the net amount.
This will act as a new column. An expression could be a formula, a function, a column, or basically anything that results in a value. So we’re just going to add a comma here, and this will be the position for our new column. Let’s calculate the net amount. If we didn’t have the discount attribute, the net amount would be the unit price times the quantity.
So to calculate the net amount with the discount rate, we’re going to multiply the unit price by the quantity and then multiply the result of that by one minus the discount. The reason why we’re subtracting the discount from one and not 100 is because the discount rate is not at its actual percentage value. As you can see here, we have 0 points, but it’s actually 15%. So if we run this query, we’ll see a new column. Noteworthy about this column is the fact that it doesn’t have a name. It’s just a formula that’s being executed for each row. So we’re going to use the as clause to give this column a correlated name. We’re going to call it the net amount. Now, if we run the query again, we can see that we now have a name for our column. When creating expressions, it’s always best practise to name your columns. That’s it for aliasing. Thank you for watching. In the next video, we’re going to look at filtering data. see you then.
refers to the action of retrieving only a subset of data based on one or multiple conditions. Imagine we have a large table containing billions of rows. This is data that has been recorded over a period of ten years. When we query the table, we don’t always want to see all of the data. We might, for example, only be interested in part of the data collected over the last two months. If we couldn’t filter the data, it would mean that every time we query the table, billions of rows would be transmitted over the network. That is very bad for performance, not to mention the drastic impact it will have on the network. This is where filtering comes in.
When we query the table, we can then apply a date-time filter to only retrieve data that has been collected over the last two months. Our large result set of billions of rows is now reduced to thousands of rows. And this is what filtering is all about. Only get the data that is useful. Filtering is done using the where clause. The following is the general syntax of the where clause. We have our basic select statement with the select command, then the select list, which again specifies the columns we want to see in the result set, followed by the from clause, where we specify from which object we want to retrieve the data. And then finally, we write the where clause. So the where clause always appears after we’ve specified the object to be queried. As stated earlier, filtering can be done based on one condition or multiple conditions. In this slide, we’re going to focus on forming one condition. A condition is basically a comparison between two values or expressions, and that’s done using comparison operators.
Here are a few comparison operators you’re probably already familiar with. We have the equal operator, which evaluates to true only if the expression one is equal to the expression two. An expression could be the result of an equation, a function, or a column. Say we have a user’s table with the column “first name.” We want to select all the rows from the user’s table where the first name is equal to John. In this case, the column first name would be expression one. We would then use the equal operator, and John would be expression two.
This effectively retrieves only the rows where the first name is John. Then we have the greater-than operator, most frequently used with dates and numbers. The same principles apply. Here you get all the rows where expression 1 is greater than expression 2. Then we have the less than operator, the less than or equal to operator, the greater than or equal to operator, and finally the not equal operator. There are two ways we can test whether an expression is not equal to another expression. It completely depends on personal preference. We are now going to look at range operators.
As their name suggests, these operators use ranges to define the condition. We have the between operator as well as the in operator. The between operator is most commonly used to define dates and numerical ranges. The syntax of the between operator is as follows: first, we need to define which expression we will evaluate. Then we use the between keyword, followed by an expression that defines the expression’s minimum value. Then we use the “in” keyword followed by an expression that defines the maximum value expression one can have. Say we have a products table and we only want to see an overview of all the products that have a list price between $100 and $200. In this case, expression one would be the list price column, expression two would be 100, and expression three would be 200. This effectively retrieves only the rows where the list price is between $100 and $200. Now, with the in operator, we’re being a little bit more explicit in terms of defining the possible values that an expression can have.
The syntax is as follows: we have an expression that needs to be evaluated. We then use the in keyword followed by a list of expressions. This condition only evaluates to true if the result of expression one is found in the list of expressions. Say we have a customer’s table, and we’re only interested in all the customers who are from Germany and the UK. Expression one would be the country column, while Germany and the UK would be contained within the expression list. This effectively retrieves only the rows where the country column has a value of either Germany or the UK. In most cases where we use range operators, comparison operators could be used instead. But this becomes inefficient because instead of using a single range operator, you have to use two or more comparison operators to get the same result.
Both comparison and range operators are logical operators, and that’s because the conditions of these operators can only evaluate to true or false. The rows where the condition evaluates to true will be included, while the rows where the condition evaluates to false will be excluded. These are the basic operators you should be familiar with. For the exam, let’s now look at how we can form multiple conditions. When filtering data using multiple conditions, there’s always a logical relationship between the conditions. For now, we’re going to focus on two logical operators: the and operator and the or operator. When we use the and operator between two conditions, then both conditions have to be true for the rows to be included in the result set. If we use the or operator between two conditions, then at least one of them has to be true for the rows to be included in the result set. Let’s head back over to SQL Server Management Studio to see this in action. We’re going to look at the orders table and filter it using some of the logical operators we’ve just talked about. First, we’re going to look at the contents of the Orders table. As you can see, we have 830 rows in the Orders table. We’re now going to apply our first filter.
Let’s get all the rows from the Orders table where the ship country column is equal to Germany. If we execute the statement, you’ll see that we now have a total of 122 rows, all of which have a ship country equal to Germany. The reason why we’ve written Germany within single quotes is because the data type of the ship country column is a character string data type. Whenever we work with columns whose data type is either a character string or a temporal data type, we use single quotes to specify the values. If we are working with columns whose data type is a numerical data type, we do not use single quotes. Let’s take a look at a query with the not equal operator. With this query, we’re asking to retrieve all the orders where the ship country is not equal to the USA. If we execute the query, you’ll see that we have a total of 708 rows, none of which have a ship country equal to the USA.
We could have also used this operator, which would have had the same effect. In the next query, we’re using the greater than or equal to operator. The employee ID column has a numerical data type. This is the reason we haven’t used any single quotes. Executing the query results in 328 rows being returned. Let’s take a look at date values. The orders table has orders that were placed from July 1996 through March 1998. I only want to get the orders that have been placed since 1998. Here we use the greater-than-or-equal-to operator to get the expected result. Because the order date column has a date or a temporal data type, we use single quotes to specify the date value. Executing this query results in 270 rows being returned. We can compare the Order date in column two to the date value. You’ll see that it’s written in a certain format. We first have the year, then the month, and then the day. In SQL Server, we have numerous formats we can use to specify date values. This is one of the standard formats.
I’m not going to cover COVID date formats in this course because it’s not part of the exam, but I have attached a document where you can find the most commonly used date formats. Let’s now take a look at the operator. Here we have a query to retrieve all the orders that were placed in the years 90 97.So we have a date range here that starts on January 1, 1997 and ends on December 31, 1997. Now let’s execute the query and see the result.
One of the most important things to remember about the between operator is that when we specify the range boundaries, those boundaries will also be included in the result set. If we look at the returned rows, you’ll see that we also have orders that were made on January 1, 1997, and orders that were made on December 31, 1997. So it’s very important to be aware of this inclusive behavior between operators. Let’s take a look at the operator. As we’ve previously seen in the slides, the inoperatoris were used to specify a range of possible values. So with this select statement, we’re basically asking for all the orders where the ship country is either Germany, France, or Spain.
If we execute this statement, we should only see orders that have either one of these countries defined as the ship country, which is what we have here. If we look at the next query, you’ll see that we have two conditions. The first condition is to select all the rows where the order date is greater than or equal to January 1, 1997. And the second condition is to get the rows where the order date is less than or equal to the last day of December 1997. We’ve used the logical and operator to specify the relationship between these conditions. As we’ve seen earlier in the slides, in order for a row to be included in the results set when using the operator, the row would need to satisfy both conditions. As a result, the order date must be greater than or equal to January 1, 1997, and earlier than or equal to December 31, 1997. If we execute the query, we get a result of 408 rows. Actually, this query is a variation of one we’ve already done, and that’s this one.
Here we have used a range operator, namely the between operator, to specify a date range, while in the other example we’ve used comparison operators to specify a date range. If we execute it again, you’ll see the same 408 rows being returned. Let’s look at another example. Here we make use of the or and the operator to form multiple conditions. What I’m basically asking here is to see all the orders for which the ship country is either Germany or France and where the employee ID is equal to one. So when I execute this query, I should only see ones in the employee ID column and either Germany or France.
In the ship country column, I have made use of the parentheses to group conditions that belong together. If I remove the parentheses, SQL Server will execute the query based on its precedence. If I execute the query now, I’ll get a far different result than what I actually want. So it’s important to pay attention to this when you have multiple conditions.
The next query is kind of interesting. While it may seem like we have three conditions, we actually have only two conditions in this statement. It’s not necessary to use parentheses, and that’s because the first and is part of the between operator, but I just do it for readability. With this query, we’re essentially asking for all orders placed in 1998 with the UK as the ship country. If we execute this query, we get a result of 16 rows. The next query is a little different. If we look at the Ship Region column, you’ll see that we have a number of nulls here. If we want to get all the rows where the Ship region is null, we use the Is Null operator. The reason we can’t use a comparison operator to do this is because a null is not an actual value.
It’s a marker to signify an empty space or an unknown attribute. So, if we run it, we get all the rows where the Ship region contains no applicable value or is basically empty. If we look at the next query, you’ll see we’ve added a new keyword, and that is not. Adding a “not” to a condition returns the opposite. So if we execute this query, we’ll now get all the rows where the Ship region has an applicable value, or where the Ship region is basically not empty. The not operator can be added to any condition, thus turning it into a negative form, so to speak. For example, if we add the not keyword to this statement, we will now get the opposite result. Instead of getting all the orders where the ship country is either Germany or France, we get all the orders where the ship country is neither Germany, France, nor Spain. That’s it for filtering. Thank you for watching. In the next video, we’re going to look at some of the other awesome things we can do when querying databases. see you then.
Popular posts
Recent Posts