# Statistics

An introduction to statistics.

## Contents |

## Overview

### Resources

- TEXT: Elementary Statistics (10th) by Triola
- R: Statistical Software (free) [1]
- Spreadsheet programs:
- gnumeric (free for *NIX systems): [2]
- MS Excel (Windows)
- LibreOffice (free, multi-operating system, not as capable as the two above)

### Outline

Probability and statistics are fundamental to our description of the universe. We won't quite get there yet--we'll leave that for physics at university, but we'll be studying the basics of statistics and probability that you will see used everyday--election polling and sports statistics for example--and in the natural and social sciences. Since more and more sources of data are being published online, we'll try to use real data as much as possible. We will also emphasize critical thinking in the interpretation of data and the misrepresentation of statistics.

### Grading

- Projects: 60%
- Quizzes: 20%
- Exams: 20%

## Why Statistics

**Assignment**: Each student will choose one of the items below and prepare a 5 minute presentation. Their presentation will include:

- A brief summary of the article.
- Identification of at least one statistical methods used/described in the article (even if implied).
- A qualitative explanation of the meaning of the statistics (this will probably require additional research)

**Applications**

- TED: Sebastian Wernicke: Lies, damned lies and statistics (about TEDTalks)
- Fivethirtyeight: Why the Olympics Probably Won't Spread Zika
- The Guardian: Global warming: why is IPCC report so certain about the influence of humans?
- Fivethirty Eight: Who Will Win the Presidency and explanation.
- TED: Hans Rosling:The best stats you've ever seen
- RadioLab: Stochasticity
- NASA: Global Temperature Anomalies
- fivethirtyeight: Religious Diversity May Be Making America Less Religious
- The Guardian: Nasa: Earth is warming at a pace 'unprecedented in 1,000 years'

## Data Sources

- Climate data (temperature, precipitation, winds, etc.):
- National Climatic Data Center: [3]

- Geophysics (Earth and environmental data):
- NGDC: [4]

- Electoral:
- ElectoralVote.com: [5]

- Sports
- Crime:
- St. Louis Police Department Crime Files: [8]

## Graphing Data

Project 1: Use available data to answer a research question.

- Student Question: Is there a difference in the crimes committed in the summer and the winter in St. Louis?

Skills:

- Frequency Distributions
- Histograms
- Statistical Graphics

HW: 8/30:

- Redo your graphs of crimes per month to adjust for the fact that different months have different numbers of days.
- Identify two follow-up questions based on the results of your analysis of the crime data AND describe how you would go about analyzing the data to answer those questions. Note: you do not have to answer the questions but your description should be detailed.

HW: 9/2:

- Write a report summarizing all the analysis that we did with the St. Louis crime data
- How to write lab reports
- Be sure to have all the parts of the report including Introduction, Procedure, ...

- Include the following graphs:
- Scatter graph of the raw data: the number of crimes individually for each month for the entire 8 years
- Scatter graph showing the number of crimes clustered by month. (All the Januarys in one column...)
- A column chart of the average crimes per month
- A column chart of Crimes per days vs. month
- Pie chart of the seasonal data.
- Frequency distribution using all of the data.
- Percentage frequency distribution of all the data.
- Cumulative distribution using all of the data.
- Percentage cumulative distribution graph.

- How to write lab reports

## Descriptive Statistics

The first three assignments are to be considered daily work, which means that if you are unable to complete them during the class period, you can turn them in at the beginning of class on the following day.

Assignment 1:

- Measures of central tendency (mean, median, mode [midrange])
- Reading: Sec. 3.1-3.2
- Assignment: Sec. 3.2 (p. 86): #5-21 odd, 29,
- Advanced Assignment: 33,34,35

Assignment 2:

- Measures of variability (range, variance, standard deviation, coefficient of variation)
- Reading: Sec. 3.3
- Assignment: Sec. 3.3: #1, 3, 7, 9, 13, 15, 21, 35, 37

Assignment 3:

- Measures of relative standing: (z score)
- Reading: 3.4
- Assignment: 3.4: 1, 5-25 odd

Assignment 4:

- Descriptive statistics for St. Louis Crime data.
- Using the crimes/day data for each month of the dataset.
- Full dataset work:
- Calculate the descriptive statistics (all of the measures of central tendency, and measures of variability for the data set)
- Based on our previous exploration of the data, the crime rates in 2015 seemed to be unusual. a) Calculate the z-score for each month in 2015 compared to the full dataset, b) Do the z-scores indicate that the data is anomalous? Explain your answer.

- For your assigned months (each student was assigned two month to work on):
- Calculate the descriptive statistics for your months.
- Determine the z-score for the month in 2015 compared to the rest of the monthly dataset. Does the z-score indicate that it is unusual or not? Why?
- Make a prediction of the crime rate for your month in 2016. Explain how you made the prediction.

- Full dataset work:
- Presentation:
- Do a 15 minute, group PowerPoint presentation to explain what the group have discovered in your analysis of the St. Louis crime data (including both the large data-set analysis and the monthly analyses).
- Assume your audience has only a basic knowledge of statistics (how much you knew before you started this class).
- Decide as a group how you want to present the monthly data (individually or combined)
- Each person must have a speaking part in the presentation.
- Include lots of graphs.
- Record all questions asked by the audience.
- Follow-up Assignment: Prepare written answers to all of the questions asked by the audience.
- Assessment:
- Verbal presentation: 20 points for full participation and a well prepared presentation (practicing ahead of time helps)
- Visual presentation: 40 points for a clear PowerPoint with good graphs, details and explanations of what the statistics mean.
- Follow-up: 20 points for clear, well thought out, written responses to the feedback from your audience.

- Do a 15 minute, group PowerPoint presentation to explain what the group have discovered in your analysis of the St. Louis crime data (including both the large data-set analysis and the monthly analyses).

- Using the crimes/day data for each month of the dataset.

## Linear Regression

- Textbook: Chapter 10.