Week |
Date |
Topic/Materials/Tasks |
1 |
Mon Jan 13 |
- Introduction, syllabus, etc.
- Get set up on the
Sagemath Cloud:
- Use Firefox (or Chrome), not Internet Explorer.
- Sign up for a free account using your real name
and your University email address (@ohio.edu). Sign
in.
- Click on "Help" in the upper left and read about it.
- Look for a project in your account titled with your
name. I created this and shared it with you. Your
submissions as an individual, such as your
biography, go here.
- Look for a project "StatisticalComputing". I
will put things here for the whole class to use.
- Do your Biographical Survey
|
Wed Jan 15 |
- Find your partner for this week's tasks and
journal, and sit next to them.
- Look for a project shared with your partner. You
will submit your journal using this project.
- One of you upload the journal sample to this
shared project. You can then both edit it.
- Play with this journal. A cell with code is
run by entering it, holding the "shift" key and
hitting the "enter" (return) key.
A cell with pretty text can be made using either
html
or markdown. (You can see how
a cell with text was made by selecting it and
hitting the button in the upper left that looks
like an eye with a slash through it.)
- Introduction to R:
- Using R in the sage cloud:
- Upload the sage worksheet basics.sagews.
Run each cell of code and observe/guess what it is doing.
|
Thu Jan 16 8am Biographical survey (as a
journal) due. |
Fri Jan 17 |
- Read about the game Checking
an Industrial Process and download the data.
- Figure out how to convert the data, upload it, and
get it into R. (See the R
Data Import/Export manual.) End with two data sets
in R, named NonDestructive and Destructive, with
(shortened) column headings "day", "cylinder",
"coord_x", "coord_y", "coord_z", "Ni", and "Cr". Apply
summary() to show that it loaded correctly.
|
2 |
Mon Jan 20 | Martin Luther King, Jr. Day
holiday |
Wed Jan 22 |
- Find your partner for this week's tasks and
journal, and sit next to them.
- Read about
vector manipulations,
data frames,
writing your own functions,
table(), summary(), length(), split(), subset(), and lapply().
- Reread the game description to see how the samples
were supposedly selected, and check:
-
Are there the correct number of cylinders selected
each day?
-
Are the correct number of samples taken from each
cylinder?
Use the functions that you just read about or others
you may find.
|
Thu Jan 23 8am journal for Jan 13-17
due. |
Fri Jan 24 |
- Rate your partner from last week.
- Read about selecting
subsets of the data.
-
The game description claims that the Company's data on
Cr and Ni percentages satisfy the criteria set by both
the Industry and the Safety Authority. Verify this.
|
3 |
Mon Jan 27 |
- Find your partner for this week's tasks and
journal, and sit next to them.
- Read the Good Problems handout on
Introductions and Conclusions. Starting with your
week 3 journal, you need to include an introduction and
a conclusion.
- Plotting:
- Read about Graphical
Procedures in R, especially plot(),
points() and lines().
- Upload the sage worksheet
r-plotting.sagews. Read through it, evaluating each cell.
- Using the Destructive dataset, make a plot of
coord_x by Ni. Make a second plot only showing
points with coord_y=0. Make a third plot only
showing points with coord_y=0 and with points
color-coded by coord_z. What do you observe?
Repeat for the NonDestructive dataset.
|
Tue Jan 28 8am journal for Jan 22-24 due. |
Wed Jan 29 |
- Rate your partner from last week.
- Remarks:
- If someone reads only the text (none of the
code) in your journal, it should make sense to
them. Use this standard when deciding what and how
much to write.
- Some functions it would have
been useful to know already are: head(), unique(),
all(), paste(), rbind(), and cbind().
- Common discrete probability distributions:
- Read about the R
functions for the binomial, geometric,
hypergeometric, poisson, and negative binomial
distributions.
- Read about mean(),
and var().
- For each of the five above distributions:
-
Choose some parameters (not trivial like 0 or 1).
- Use the d....() function to generate the probability
distribution function and plot it.
- Use the r....() function to generate 1000
data points. Use table() and lines() (or
points()) and appropriate scaling to plot it on
the same graph as the distribution function, so
that they approximately match.
- Use mean() and var() to check the mean and variance
of the data; compare to the theoretical values.
|
Fri Jan 31 |
- Common continuous probability densities:
- Read about the R
functions for the uniform, normal, gamma, and
beta distributions.
- Read about density().
- For each of the four above densities:
-
Choose some parameters (not trivial like 0 or 1).
- Use the d....() function to generate the probability
density function and plot it.
- Use the r....() function to generate 1000 data
points. Use density() and lines() to plot it
on the same graph as the distribution function.
- Use mean() and var() to check the mean and variance
of the data; compare to the theoretical values.
|
4 |
Mon Feb 3 |
Snow day! Class canceled.
|
Tue Feb 4 8am journal for Jan 27-31 due. |
Wed Feb 5 |
- Rate your partner from last week.
- Find your partner for this week's tasks and
journal, and sit next to them.
- Making functions:
- Read about
writing your own functions,
return(), if(), sapply(), and while().
- Implement the function
\(f(x)=\left\{\begin{array}{ll} 1-|x| & -1\le x \le 1\\ 0 & \text{otherwise}\end{array}\right.\) and plot it on \([-2,2]\).
- Find the explicit formula for the function
\(F(x)=\int_{-\infty}^x f(t)dt\), implement it,
and plot it on \([-2,2]\).
- Find the explicit formula for the function
\(F^{-1}(y)\), implement it, and plot it on an
appropriate interval.
- Custom densities:
- Read about Inverse
transform sampling. Summarize the method
in your own words. Use this method to generate
samples from the probability density function \(f(x)\)
above. Show that it worked.
- Read about Rejection
sampling. Summarize the method in your
own words. Use this method to generate samples from
the probability density function \(f(x)\) above.
Show that it worked.
|
Fri Feb 7 |
- Read about library() and matrix().
- Multivariate Normal distributions:
- Read about the Multivariate Normal distribution and the methods for drawing values from it. Summarize in your own words.
- Read about the rmvnorm(), pmvnorm(), qmvnorm(), and
dmvnorm() functions in the mvtnorm package.
- Using \(\mu=[1,2]\) and
\(\Sigma=\left[\begin{array}{cc}1&0.8\\ 0.8&1\end{array}\right]\)
generate 1000 samples and plot them.
|
5 |
Mon Feb 10 |
- Find your partner for this week's tasks and
journal, and sit next to them.
- Linear Models
- Game coord_z dependence:
- From the Destructive dataframe make a dataframe
that only uses points where coord_y==0, has first
column coord_z, and has second column Niun8x (Ni
after removing the mean 8 and the dependence on
coord_x). Plot it.
- Use lm() to model Niun8x as a line in
coord_z. Plot the resulting line along with the
original data. Factor out the slope to get Niun8x
modeled as proportional to (coord_z+C) for some
(rounded) C. Plot coord_z versus Niun8x/(coord_z+C);
how well did it do?
- Use lm() and I() to model Niun8x as a quadratic
in coord_z. Do you think the actual relationship is
linear or quadratic?
|
Tue Feb 11 8am journal for Feb 3-7 due. |
Wed Feb 12 |
- Rate your partner from last week.
- Nonlinear models:
- Game coord_y dependence:
- From the Destructive dataframe make a dataframe
that has first column coord_y and has second column
Niun8xz (Ni after removing the mean 8, the
dependence on coord_x, and the dependence on
coord_z). Plot it.
- Guess a functional form (or a few) for how it
depends on coord_y, such as \(ay+b\), \(a\exp(by)\),
\(a\exp(b(y+c)^2)\), etc. Use nls() to find the
parameters then plot the original data and the
fitted values. Which functional form matches best?
- Using your best match but ignoring the overall
amplitude, Divide Niun8xz by this function of
coord_y and plot. How well did it do?
|
Fri Feb 14 |
- Game day and cylinder dependence:
- From the Destructive dataframe make a dataframe
that has first column day, the second column cylinder,
and third column
Niun8xzy (Ni after removing the mean 8, the
dependence on coord_x, the dependence on
coord_z, and the dependence on coord_y).
- Try to determine the dependence of Niun8xzy on
day and/or cylinder and remove it.
|
6 |
Mon Feb 17 |
- Find your partner for this week's tasks and
journal, and sit next to them.
- Monte Carlo methods for integration:
- Read about Monte
Carlo integration. Summarize the method in your
own words. Use the (basic) method to estimate
\(\int_{-1}^1 (1-x^2)\,dx\).
- Read about Antithetic
variates. Summarize the method in your own
words. Use this method to estimate \(\int_{-1}^1
(1-x^2)\,dx\).
- Read about Importance
sampling. Summarize the method in your own
words. Use the method to estimate \(\int_{-1}^1
(1-x^2)\,dx\) using sampling density
\(f(x)=1-|x|\). Is this a good choice for the
sampling density?
- Compare the three methods and describe what you observe.
|
Tue Feb 18 8am journal for Feb 10-14 due. |
Wed Feb 19 |
- Rate your partner from last week.
- Read about sample(), rep(), %%, %/%, and
replicate().
- Generating the game sampling:
- Read the game description on how the Safety Authority
will sample.
- Write a function cylindersample() that produces
a dataframe with first column coord_x, second column
coord_y, and third column coord_z, and rows with
samples following the distribution described for
sampling one cylinder.
- Write a function allsample() that produces a
dataframe with first column day, second column
cylinder, third column coord_x, fourth column
coord_y, and fifth column coord_z, and rows with
samples following the distribution described for
sampling the entire production of cylinders.
- Run tests to confirm your sampling is working as desired.
|
Fri Feb 21 |
- Game fines:
- Read the game description on how the Safety Authority
will assess fines based on the Ni content.
- Write a function fineNi(Ni) that inputs a vector
of Ni values and returns the fine.
- Based on our analysis of Ni, write a function
predictNi(day,cylinder,coord_x,coord_y,coord_z).
Find the maximum and minimum values of this function.
|
7 |
Mon Feb 24 |
- Find your partner for this week's tasks and journal,
and sit next to them.
- Read the Good Problems
handout on
Logic. Starting with your week 7 journal, you need
to make your logic clear and use logical connectives.
- Read about missing
values, t(), rowMeans(), and colMeans().
- Partner ratings:
- From the StatisticalComputing project, get the
file ratingmatrix.data. Each row is a rater and each
column is the person rated. (Do not try to figure
out which row/column is you.)
- Compute the mean rating each person received,
and the mean rating each person gave.
- If both partners rated accurately, then the sum
of their ratings of each other would be 5. A
generous person would tend to produce higher sums
and a greedy person lower sums. Measure the extent
to which each rater is generous/greedy.
- From the above measures and/or any other
measures you think are relevant, compute a score for
each person.
- Decide a system for translating a score to an
adjustment on the overall journal grade for a
student. The system must be net neutral, meaning
that the sum of the adjustments must be
zero. Explain why you think your system is fair.
|
Tue Feb 25 8am journal for Feb 17-21 due. |
Wed Feb 26 |
- Rate your partner from last week.
-
Monte Carlo estimate of the expected Ni fine:
- Formulate the expected value of the Ni fine as
an integral of some function over some random
variable.
- Use Monte Carlo integration to estimate this integral.
(Don't break the sage cloud.)
- Bootstrapping:
- Read about Bootstrapping. Summarize
the method in your own words.
- Read about the boot package and its functions
boot() and boot.ci().
- Use boot() and boot.ci() to study the empirical
distribution of the fine. Is your Monte Carlo sample
of the fine sufficient to determine the expected fine within
a reasonably narrow range? What else do you observe?
|
Fri Feb 28 |
- Look at StatisticalComputing/Projects/guidelines.sagews.
Over break, think about what you would like to do
for your final project.
- 5530 students: Should we really have a final exam,
or should we substitute something else? Any substitute
should be done alone and should be beyond what the
undergaduates are required to do. Possibilities include:
- You do your project individually and include an
analysis of the theory behind one of the
computatonal techniques you used. This would count
as both your project and exam.
- You do a short paper with the background and
theory of a computational technique. This would
replace your exam.
Comment and/or make suggestions in your exploration.
- Jackknife:
- Read about the Jackknife.
Summarize the method in your own words.
- From our Monte Carlo data on the fine for Ni,
calculate the Jackknife estimate of the expected
fine. How does it compare to the original Monte
Carlo estimate of the expected fine?
- Compute the Jackknife estimate of the standard
error and compare it to the bootstrap estimate of
the standard error.
|
Spring Break |
8 |
Mon Mar 10 |
- Find your partner for this week's tasks and journal,
and sit next to them.
- Upload talktemplate.tex and OHIOCLR.pdf; open
talktemplate.tex.
- Look in StatisticalComputing/partners.sagews for
your topic. Modify talktemplate.tex into a short but
coherent talk on this topic. Your main product is the
resulting slides; in your journal comment briefly on
problems you encountered, cool things you learned, etc.
|
Tue Mar 11 8am journal for Feb 24-28 due. |
Wed Mar 12 |
- Rate your partner from week 7.
- Miscellaneous plotting tools:
- Read about qqplot(). Generate
- samples from a normal distribution,
- samples from a normal distribution but with
different mean and variance, and
- samples from a different distribution (such
as gamma).
Apply qqplot() to each pair of data and describe
what you observe.
- Apply pairs() to the Destructive data set and
describe what you observe.
- Plot Ni versus Cr and describe what you observe.
Read about the cloud() function in the package
lattice. Plot coord_z versus Ni and Cr and describe
what you observe.
|
Fri Mar 14 |
Instead of coming to class, go to the \(\pi\)-day poster
session and/or the Colloquium. Write about it in your journal.
|
9 |
Mon Mar 17 |
Work on figuring out what to do your project on and with
whom you will work. Once that is settled, start working on
the project.
|
Tue Mar 18 8am journal for Mar 10-14 due. |
Wed Mar 19 |
- Rate your partner from last week.
- From now on you will work with your project
partner(s). (If you are doing your project alone, you
can work with a partner on the journals.)
- Read the Good Problems handout on
Flow. Starting with this journal, pay extra
attention to using complete sentences and paragraphs and
having text bind your journal together.
- Markov Chain Monte Carlo methods:
- Read about Markov
Chain Monte Carlo methods. Read about it again,
this time slower and more carefully. Summarize in
your own words.
- Read about the Metropolis-Hastings
algorithm. Summarize in your own words.
- Run install.packages("mcmc") to install the mcmc
package and then library(mcmc) to load it. (If you
have trouble, see the manual
install instructions and use the lib.loc option
in library().) Read about its metrop() function.
- Let \(f(x)=\left\{\begin{array}{ll} 1-|x|
& -1\le x \le 1\\ 0 &
\text{otherwise}\end{array}\right.\). Use metrop()
to produce samples from the distribution \(f\). Plot
the samples produced versus their index to see how
the Markov Chain moves around. Plot the resulting
density to see how well the sampling worked.
- Let \(g_A(x) = \frac{f(x) + f(x-A)}{2}\). Use
metrop() to produce samples from the distribution
\(g_A\) for \(A=1,3,5\). For each \(A\) plot the
samples versus their index and the resulting
densities. Explore the options to metrop() to make
the sampling and resulting densities better.
|
Fri Mar 21 |
Work on your project. Set some goals.
|
10 |
Mon Mar 24 |
- Gibbs sampling:
- Read about Gibbs
sampling. Summarize in your own word.
- Write a Gibbs sampler to construct samples from
the uniform distribution on a disc of radius 1. Plot
the resulting points.
- Read about the kde2d() function in the MASS
package; apply it to your samples. Plot the
resulting density using contour(), image(), and
persp().
- Let
f(x,y)=unif(x,min=0,max=5)unif(y,min=x,max=2x+1). By a
method of your choice, construct samples from this
distribution. Plot the points and/or density by a method
of your choice.
|
Tue Mar 25 8am: I will look at what you have
so far for your project, but not grade anything. (No
exploration due.) |
Wed Mar 26 |
Work on your project.
|
Fri Mar 28 |
(drop deadline with WP/WF)
- Reverse-engineering data:
- Get the discrete data set unknowndiscrete.dat.
Determine which discrete probability distribution
(one of binomial, geometric, hypergeometric,
poisson, negative binomial) was used to generate it
and with what parameters.
- Get the continuous data set unknowncontinuous.dat.
Determine which continuous probability density (one
of uniform, normal, gamma, beta) was used to
generate it and with what parameters.
|
11 |
Mon Mar 31 |
Work on your project. |
Tue Apr 1 8am journal for Mar 19, 24, and 28 due. |
Wed Apr 2 |
- Read the Good Problems handout on
Graphs. Starting with this journal, pay extra
attention to titles, labels, and legends on your graphs.
- Maximum Likelihood Estimators:
- Read about Maximum Likelihood Estimators. Summarize in your own words.
- Read about the mle() function in the stats4 package.
- Using the continuous data set unknowncontinuous.dat
that we studied last week, select the density type
that you determined and use mle() to compute the
maximum likelihood estimate of the
parameter(s). Plot the density function using these
parameters along with the density from the data to
see how well it worked.
- Using the discrete data set unknowndiscrete.dat
that we studied last week, select the distribution
type that you determined and use mle() to compute
the maximum likelihood estimate of the continuous
parameters. If the distribution type you selected
has discrete parameters (like n), then fix them using
the 'fixed' option and manually try a few values
near your estimate from last week; which value is
most likely to be true? Plot the distribution
function using the parameters you determined and the
normalized table from the data to see how well it
worked.
|
Fri Apr 4 |
- Work on your project.
- 5530 students look at the template finaltemplate.tex for your
final exam/paper and discuss with me.
|
12 |
Mon Apr 7 |
- Expectation Maximization:
- Read about the EM
algorithm and especially its use for Gaussian
mixtures. Summarize the method in your own
words.
- Install the package mclust, load it, and read
about its em() function.
- Generate samples from the (Gaussion mixture)
density 0.3*normal(0,1)+0.7*normal(5,2). Forget that
you know the parameters 0.3, 0, 1, 0.7, 5, and 2,
and use em() to try to recover them.
|
Tue Apr 8 8am rough draft of project report
due; it counts as a journal. (No exploration due.) |
Wed Apr 9 |
Work on your project.
|
Fri Apr 11 |
-
More miscellaneous plotting tools:
- Plot the data x=c(1,2,4,5,9) y=c(0,-1,3,3,1)
using each of the 9 options for type. Use layout()
to show in a 3x3 grid.
- Pick your favorite type (not "n") and plot with
a title, subtitle, x-label, y-label, and larger x
and y limits.
- Read help(plotmath). Repeat the above plot, now with
some of the labels mathematical expressions; use at
least some powers and greek letters.
- Repeat the above plot, making it colorful. Use
abline() to add a thick green line \(y=0.5x-1\).
- Read about pie(), barplot(), and rainbow(). Make
a pie chart and a barplot of the y values colored by
rainbow; use layout() to show them side by side.
- Use boxplot() to make a boxplot of x and
y. Title and label it.
- Make a histogram of the x values and a second
histogram with too many breaks; use layout() to show
them side by side.
- Plot the density of the x values and a second
density with too small width parameter; use layout()
to show them side by side.
|
13 |
Mon Apr 14 |
Project presentations |
Wed Apr 16 |
Project presentations |
Fri Apr 18 |
Project presentations |
14 |
Mon Apr 21 |
- Create a data set D with samples from the uniform
distribution on the unit disc.
- Create a data set G with samples from the
two-dimensional normal distribution with \(\mu=[0,0]\)
and \(\Sigma=\left[\begin{array}{cc}1&0\\
0&1\end{array}\right]\).
- For each of these:
- Determine (by thinking) the density function for
the x values and the density function for the y
values. Are they the same?
- Determine (by thinking) whether x and y are
independent.
- Test computationally whether or not x and y have
the same distribution.
- Test computationally whether or not x and y
are independent.
|
Wed Apr 23 |
|
Thu Apr 25 8am journal for Apr 2, 7, 11, and 21 due.
(5530 students: draft of final paper due; counts as an exploration.) |
Fri Apr 25 |
Clean-up |
15 |
Fri May 2 |
Final Exam 3:10-5:10pm in our regular
classroom. Project reports and slides due. (5530 papers due.)
|