Introduction To SQL: Part I

Daniel Brancusi
Posted on Sep 30, 2020
Database Computing Representing Information Storage 3d Rendering

What is SQL?

"SQL (pronounced "ess-que-el") stands for Structured Query Language. SQL is used to communicate with a database. According to ANSI (American National Standards Institute), it is the standard language for relational database management systems."  ANSI will come into play later as there are often many ways to execute the same task in SQL.  Make sure to check with your employer if they maintain ANSI standards in their code.   "SQL statements are used to perform tasks such as update data on a database, or retrieve data from a database."  There are several different companies that operate database management systems with SQL, including Microsoft and Oracle.  While SQL standard commands are largely transferable, most systems also have unique property extensions.   There are also NoSQL databases, however we will not be covering these.  "NoSQL databases are increasingly used in big data and real-time web applications."  Some notable NoSQL databases include MongoDB, Oracle NoSQL Database, and (the best named) Voldemort.

SQL is a Relational Database Management System - RDMS for short.  The data held within a RDBMS is stored in tables like the one below.

A table from SQLDeveloper

All SQL tables are composed of fields.  A field is simply a column within a table.  As an example, in the table above, the fields are MAKE, MODEL, MODEL_YEAR, MILAGE, etc.  Each field contains information for every record in the table and each record has its own row in the table (for example, there are 30 records in the table above).  While small tables such as the one above are relatively simple to interpret with the eye, tables can have millions of entries. Therefore, we need an efficient way to retrieve the information we're looking for.  

Select, From, Where...

Three of the most widely used keywords in SQL are SELECT, FROM and WHERE (note: you do not have to capitalize these words but I do).   

SELECT

The SELECT statement tells SQL what fields (or columns) you would like to use in the query.  Within the statement, aggregation functions such as COUNT, MAX, MIN and AVG can also be used (these will be covered in a later post).  Finally, the * operator can be used to represent all.  As an example, in our table above we could write a statement to select all fields in two ways:

-- OPTION 1: LIST ALL FIELDS
SELECT 
    MAKE, 
    MODEL, 
    MODEL_YEAR, 
    MILEAGE, 
    COLOR, 
    PRICE, 
    CONDITION_CD 

-- OPTION 2: USE THE * OPERATOR
SELECT *

As we continue forward, the use of the * operator will become more natural.  Also, new lines are not required for each field, however for readability I often will put each field on its own line.  This is especially true with a long list so referencing back becomes significantly easier.

FROM

The FROM statement lets SQL know which table(s) should be referenced in the query.  It may seem logical that FROM should begin the query - and you would be largely correct!  When an SQL query is run, the FROM command is the first item executed (along with JOIN, which we'll get into in the next post).  Nevertheless, for writing SQL queries we keep the structure of SELECT followed by FROM.  

Within the context of our query, FROM specifies the table(s) that should be referenced. Using our LOT_CARS table from earlier:

-- EXAMPLE 1: USING ONE TABLE

SELECT 
  MAKE, 
  MODEL_YEAR, 
  CONDITION_CD
FROM LOT_CARS
;

-- EXAMPLE 2: USING MULTIPLE TABLES

SELECT 
  LC.MAKE, 
  LC.MODEL_YEAR, 
  LC.CONDITION_CD,
  CT.DESCRIPTION
FROM LOT_CARS AS LC, 
     CONDITION_TBL AS CT
;

note: notice the semicolon after each statement.  "The semicolon character is a statement terminator. It is a part of the ANSI SQL-92 standard"

While the first example above should be readily understood, we'll go into some details about the second example.  First, let's explain the AS next to LOT_CARS and CONDITION_TBL.  The AS allows us to alias each table however we wish (there are some rare exceptions).  We can also use AS to alias how columns are labeled when our query returns a result.  There is no actual need to use the AS operator (we can simply place a space between the table name and the alias).  However, including the AS operator enhances readability.  Finally, we have added LC. and CT. to our selected fields.  Because we are referencing two different tables, we need to specify which table we would like to use if more than one of the selected tables contain a field with the same value as selected in our query.  When field names are unique, the prefix is not needed. 

We can also return a result utilizing only SELECT and FROM:

/* RETURNS THE MAKE AND MODEL_YEAR FIELDS FROM THE LOT CARS TABLE */

SELECT 
  MAKE, 
  MODEL_YEAR
FROM LOT_CARS
;

note:  to comment out a line in SQL we use two dashes and to comment out multiple lines we begin with /* and end with */

WHERE

While a table can be returned using only SELECT and FROM, the WHERE clause adds an immense amount of power to queries by filtering the returned results to only those that meet the condition of the WHERE clause. There are many conditional statements in SQL that can be used within a WHERE clause (as well as throughout a query) and will be familiar to someone with even minimal programming experience.   

OPERATORDESCRIPTIONEXAMPLE
=Checks if operands are equalMAKE = "Ford"
!=Checks if operands are not equalMODEL != "Civic"
<>Checks if the two operands are equal or not. If values are not equal then the result is true. This is the same as != but is "correct" under ANSI standardsMODEL <> "Civic"
>Checks if the left operand is greater than the right operand5 > 4 (True)
<Checks if the left operand is smaller than the right operand5 < 4 (False)
!<Checks if the left operand is not smaller than the right operand5 !< 4 (True)
!>Checks if the left operand is smaller than the right operand5 !> 4 (False)
>=Checks if the left operand is greater than or equal to the right operand5 >= 4 (False)

SQL also has many logical operators.  Some of the most important are listed below.

OPERATORDESCRIPTION
ALLreturns true if ALL of the subquery values meet given condition
ANYreturns true if ANY of the subquery values meet given condition
ANDThe AND operator allows the existence of multiple conditions in an SQL statement's WHERE clause
BETWEENThe BETWEEN operator is used to search for values that are within a given minimum and maximum value
EXISTSThe EXISTS operator is used to search for the presence of a specified record
INThe IN operator is used to determine if a record's value is contained in a specified list
LIKEThe LIKE operator is used to search for a specified pattern
NOTThe NOT operator reverses the meaning of the logical operator it preceeds
ORThe OR operator is used to combine multiple conditions in an SQL statement's WHERE clause.
UNIQUEThe UNIQUE operator returns distinct occurrences within a specified field

We are now ready to use what we've learned to write queries in SQL.  Let's add in a simple where clause to our previous example: 

--EXAMPLE 1
SELECT 
  MAKE, 
  MODEL_YEAR, 
  CONDITION_CD
FROM LOT_CARS
WHERE MAKE = "Ford"
;

--EXAMPLE 2
SELECT 
    MAKE, 
    MODEL_YEAR, 
    CONDITION_CD
FROM LOT_CARS
WHERE 
    CONDITION_CD BETWEEN 1 AND 3 
    AND MODEL_YEAR <> 2017
;

You can now go out and write your own queries in SQL!  Next time we'll discuss joins and their use.  If you have any questions leave a comment below!

About Author

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp