DS 401 : Statistical modeling for
data science : Fall 2021
MWF 8-11:40 am, CST 103
Instructor
|
|
Office hours
|
The following hours are tentative -
I'll finalize office hours
after the 1st week
M: 1-2 pm. T: 1-2 pm. W: 1-2 pm.
And by appointment or walk-in.
The best way to contact me, in order of
preference, is: [1] in person,
[2] by email, [3] by phone.
Open door policy:
I keep my posted office hours to
a bare minimum, to avoid being locked into a rigid schedule
all semester. However, I am happy to assist students well
beyond my office hours. Students are encouraged to just
drop by whenever needed.
Anytime my
office door is open you're welcome to stop by and check whether
I am available.
Also, please do not hesitate to make an appointment
if my posted office hours don't work for you.
|
Class website |
https://cs.earlham.edu/~pardhan/courses/ds401/
The website is a central component of this
class, and you are responsible for regularly checking it for
announcements, homework assignments and various
supplementary handouts. I prepare for class with the assumption
that students have reviewed the website and followed through on
posted instructions.
|
Textbook
and reference materials
We will use various open-source,
online materials. Our primary textbook and reference
resource will be:
OpenIntro Statistics, 4th Edition, 2020,
by David Diez, Mine Cetinkaya-Rundel, Christopher Barr,
and openintro.org.
Other references that will be be useful
for certain topics include:
Principles and techniques of data
science, 2021,
by Sam Lau, Joey Gonzalez, and Deb Nolan.
Introductory statistics, 2018,
by Barbara Illowsky, Susan Dean,
and openstax.org.
|
Course
credits and work load
This course is worth 4 credits, and
will meet for in-person classes for 400 minutes each week
for 7 weeks.
This is consistent with the standard practice of 4-credit courses
meeting for 200 minutes per week during a regular
14-week semester. In addition, students should expect a
workload outside class
of about 15 hours each week.
|
Requirements this class fulfills
This class is required for majoring
in data science. In addition, it fulfills the Quantitative Reasoning
component of Earlham's General Education requirements.
|
Description & objectives
The breadth and diversity of real-world
data science projects makes it nearly impossible to
devise standardized procedures and recipes that would be
applicable to all of them.
In spite of this diversity, however, it is possible to identify
certain common components that frequently arise in the life
cycle of most data science projects. The schematic diagram
below shows key components in the life cycle of a typical
data science project.
The main objective of this course is to introduce certain
essential methods, tools and theoretical concepts
relevant to the data analysis, data modeling, and model
evaluation phases of the life cycle. In particular, we will
focus on statistical
methods designed for studying patterns and relationships
within data sets, and for modeling such relationships. In
addition, we will learn how to use such models for forecasting
and decision-making, as well as
critically examine questions that relate
to quality, reliability, and effectiveness of data models.
In terms of software and implementation aspects,
an important goal of this course is to train students in
the use of R.
Key topics this course will cover include: a review of
probability models, distributions, and the central limit
theorem; linear regression with two variables;
linear regression with many variables; logistic
regression; diagnostics and inferences for regression.
If time permits, we will also cover topics in principal
component analysis and time series regression.
|
Student
learning goals and outcomes
Upon successful completion of this
course, students will be able to
1. |
Understand the key
components in the life cycle of typical data science projects.
|
2. |
Carry out basic
statistical analysis of structured data sets, including
exploring and summarizing key trends and patterns.
|
3. |
Develop models for studying
relationships between variables, including linear
regression, multilinear regression, logistic regression.
|
4. |
Evaluate the quality,
reliability, and effectiveness of regression models.
|
5. |
Use the R suite of software
tools to carry out statistical analysis and modeling.
|
These aspirations broadly support all
7 learning goals
of an Earlham education (see the Appendix attached
to this Syllabus).
|
Course prerequisites:
Elementary Statistics (MATH 120) or Mathematical
Statistics (MATH 300), and Calculus A (MATH 180).
|
Assessment &
grading policy
Your final grade will be based on combined
performance on: quizzes and classwork,
lab projects, homework problems,
one exam during the semester, and a final
exam. Each will contribute the
following proportions:
Quizzes & Classwork | 30% |
Lab projects | 15% |
Homework | 15% |
Mid-term exam | 20% |
Final exam | 20% |
Letter grade boundaries for this course
are not set in advance. They will be determined at the end of
the term, based on factors such as overall class
performance, level of difficulty of tests, quizzes, and assigned
work, etc.
At a minimum, the following standard scale
for letter grades will be honored:
A+: 97.0-100; A: 93.0-96.9; A-: 90.0-92.9;
B+: 87.0-89.9; B: 83.0-86.9; B-: 80.0-82.9;
C+: 77.0-79.9; C: 73.0-76.9; C-: 70.0-72.9;
D+: 67.0-69.9; D: 63.0-66.9; D-: 60.0-62.9;
F: below 60.
NOTE that all students must also satisfy the
following minimum requirements to receive a grade of C- or better:
* Take both the exams (mid-term and final).
* Turn in at least 75% of the homework problems.
* Turn in at least 75% of the quizzes and classwork.
* Complete all the lab projects.
|
More details
about assessment categories
Quizzes and classwork:
In-class quizzes and/or classwork
will be frequently assigned throughout the semester.
A key purpose of the quizzes is to help us accomplish
learning goals 2 through 4 (listed above). Quizzes
will typically be short (e.g., 10 minutes), and will focus on
conceptual understanding of key ideas.
Classwork, on the other
hand, will be much like homework problems, and will serve
the purpose of hands on learning and practice in class. This
will help us accomplish learning goals 1 through 4.
Students will sometimes do classwork in teams and, in such
cases, turn in a common "team solution" for grading.
Lab projects:
The goal of labs
is to learn through focused, hands on exploration into
application areas, and to enhance conceptualization.
In addition, labs will play a key role in helping students
learn how to use R.
Collectively the labs will broadly support all 5 learning
goals.
Homework:
The purpose of homework is to help
you learn course content and to give you practice
applying concepts and solution techniques.
Exercises will be assigned from the textbook and other
sources at various points throughout the semester.
These must be turned in at the beginning of class on the
indicated due dates. Homework exercises
will help us accomplish learning goals 2 through 4.
Exams:
There will be one mid-term exam during the term, plus a
final exam at the end of the term. The exams will help
fulfill and assess goals 1 through 4.
The tentative date of the mid-term exam is
October 27.
The final exam date and time is set by the
registrar's office.
According to their calendar, the final exam
will be held
Monday, Nov. 22, at 9 am.
|
Important dates
* Last day to
add this course: Oct. 6.
* Last day to drop: Nov. 5.
* Date of final exam: Nov. 22.
NOTE: Last drop date applies to Earlham students only.
Students cross-registered through IU-East or other institutions must
follow the dates and rules of their own institution.
|
Academic integrity
After several years of writing standard,
boiler-plate stuff in this section, I have decided to replace it with
a more authentic message from my heart to yours. Before getting
into details, I would like to share 3 key ideas that profoundly
shape my thinking, and prompt me to explore more effective ways
towards academic integrity:
- Academic infractions are a much bigger problem at
Earlham than many of us would like to believe or admit.
- The problem is NOT our students!
Earlham students are as good (or better!) than their peers at
other institutions in terms of moral values and ethical standards.
- Infractions at Earlham can be significantly reduced using
a combination of strategies, collectively developed by students
and faculty.
These three points summarize my overall perspective, and
will frame the rest of my discussion on this subject.
By far the
single biggest phenomenon that has radically transformed today's
academic integrity / infraction landscape is technology --
particularly the internet and cell phones.
In my view, Earlham's
traditional approach to academic integrity has been rendered
completely obsolete by these technologies. If I were an Earlham
student today, I would encounter many situations where the
temptation to infract would be extremely high, because these
technologies make it so easy, and the risk of getting caught is
virtually zero.
This is the main reason why I say that you, the
student, are not the problem. You are human, just like me
and my faculty colleagues. It is a fact of life that many humans
succumb to temptation when the rewards are sufficiently high,
and the risks sufficiently low.
Yet, the fact remains, a growing rate of
academic infractions is a terrible thing
to ignore: They sink an institution's reputation, decrease the
value of students' education, lower student & faculty
morale, and more. Clearly, we need to explore and develop
new strategies that are more effective for our times, and also
preserve Earlham's distinctive approach to such matters. We
will set aside some class time to discuss and formulate
specific policies for helping students (joyfully!) meet and exceed
the highest standards of integrity in this class. In the meantime,
I invite you to reflect on some practical ways that would most
help and support you in avoiding the use of inappropriate
sources for completing and turning in your graded work.
I would like to conclude with the following excerpt
from the Earlham Academic Integrity Policy:
"The College trusts students who enroll
at Earlham to be
honest seekers of truth and knowledge. This trust is extended to
all students by other students and by teachers ...
Giving or receiving aid inappropriately on
assignments and tests, or plagiarizing by using another person's
words or ideas without credit, constitutes a serious breach of our
trust in one another and in the integrity of the search for truth.
Those who believe they have witnessed violations of academic
integrity should feel the obligation to speak about this to the
suspected offender. The witness also should feel obligated to
report the suspected offender to the instructor if the person
fails to offer a satisfactory explanation and refuses to report
him or herself. ...
Violations of academic integrity, because they undermine our
trust in one another and in the credibility of the academic
enterprise, are taken very seriously. Penalties for violations
range from failing assignments or tests to suspension or expulsion
from the College.
"
|
Makeups
In-class items: There will be no makeup for missed
in-class items (e.g., quizzes, classwork, class participation,
etc.) regardless of reason. I will drop your lowest two scores
as an implicit way of making up for missed items.
Homework: Past-due assignments will not be
accepted except in rare circumstances, provided the student
receives prior consent from the instructor.
Exams: Make-up exams will not be given
except in cases of documented illness or emergency.
|
|
Academic accommodations
Students with a documented disability
(e.g., physical, learning, psychiatric, visual, hearing, etc.)
who need to arrange reasonable
classroom accommodations must request accommodation memos
from the Academic Enrichment Center (main floor of Lilly
Library) and contact their instructors each
semester. For greater success, students are strongly encouraged
to visit the Academic Enrichment Center within the first two weeks
of each semester to begin the process. For further details, please visit
https://earlham.edu/academics/academic-support-and-special-programs/academic-enrichment-center/accessibility-services/
|
Other sources of help
- The Academic Enrichment Center:
The Academic Enrichment Center (AEC), located in
Lilly Library,
provides assistance with study habits and skills as well
as a peer tutoring service. The AEC is staffed by trained
peer tutors for either pre-arranged group tutoring sessions
(provided for many math, science and social science
courses) or one-on-one tutoring sessions for other
courses. Peer tutoring is a free service offered to all
Earlham students. Please visit
https://earlham.edu/academics/academic-support-and-special-programs/academic-enrichment-center/peer-tutoring/
for more information.
- The Earlham Writing Center:
The Writing Center is dedicated to providing students
with advice and resources about writing. Students can meet
one-on-one with trained consultants who will contribute feedback
to writers at any stage of the writing process: brainstorming,
drafting, researching, revising, and polishing. This is a free, walk-in
service on the main level of Lilly Library.
In addition to dropping by, students may
also schedule an appointment in advance
using the online scheduler found at:
http://www.earlham.edu/writing-center/.
Also, if you want help with specific grammar topics related
to your own writing,
https://www.grammarly.com/edu is available
for all Earlham students to proofread their papers and learn
more about grammatical errors.
|
|