“Without coding, computers would literally do nothing. They would be completely useless. That’s because source code is the set of instructions that tells the computer what to do. Computers have no free will, after all, so without explicit instruction, they’re just sophisticated hunks of metal.” - Travis Addair, Software Engineer, on Quora.


Background Poll

1. On a scale of 0 (not any) to 10 (sophisticated), how to describe your programming experience?

2. Are you familiar with the following languages?

  • FORTRAN
  • C/C++
  • Java
  • Other
  • I know English, maybe some Cantonese

3. Do you use the following languages for data analysis?

  • MATLAB
  • Python
  • R
  • IDL
  • Other
  • No, I don’t

4. What do you use for data visualization?

  • MATLAB
  • Python
  • R
  • IDL
  • Origin
  • Other (but not Excel or similar spreadsheet software)
  • I use Excel, sometimes Power-point

5. Have you heard about Git?

  • Yes
  • No

6. Do you know how to use command-lines?

  • Yes, I know DOS
  • Yes, I know Linux/Unix terminal
  • No, what’s a command-line?

7. What is your laptop’s operating system (OS)?

  • Windows
  • macOS
  • Ubuntu
  • Other

8. In your view, why is programming necessary? (open question)

  • Research
  • Job hunting
  • Just for fun
  • Other

9. In your view, why is programming hard? (open question, source)

  • Oh, I’m just really not a math person
  • Programmers are elite hackers, like in movies
  • I need a computer science degree to write code
  • Programmers should be immediately good at all things about the computer
  • Other

10. Name one thing you want to achieve at the end of the semester (open question)


Course topics to be covered

Research in the ESE field (in fact, any field) is all about data:
drawing
Hopefully, this course will make your life easier when it comes to interpret and visualize data.

Course topics:

  • Three languages: Python, Shell (BASH), and FORTRAN
  • One operation system: Linux
  • At the end of the semester, you should be able to:
    • Use Git for version control
      • Github, GitKraken, etc.
    • Handle common file formats with Python
      • plain text, .csv, .nc, .hdf, etc.
    • Work with data neatly using common packages like numpy, scipy, matplotlib, pandas, and xarray
    • Proceed statistical tests with Python
      • comparing means, ANOVA, regressions, etc.
    • Create publication-quality figures with Python
    • Conduct parallel computing in Python
    • Use the Linux environment
      • basics, BASH scripts, installing software, loading modules, etc.
    • Run jobs on TaiYi high-performance computer
      • SSH, managing jobs, etc.
    • [Optional] Write, compile, and run FORTRAN scripts
      • gfortran, intel-fortran, functions, modules, etc.
    • [Optional] Build and launch a website
      • Rmarkdown (In fact, this website is built with Rmarkdown)

drawing drawing drawing


Course structure

  • Class 09-10
    • New material demonstrated (45-60 mins)
    • In-class exercises (30-45 mins)
  • Class 11
    • Lab exercises (15-30 mins)
    • Working on assignments
    • Discussing the final project
    • One-on-one interactions
  • Office Hours to discuss specific questions

Course grading

  • Assignments (50%)
  • Final project (40%),
  • Class participation (10%)

Take the course or not?

This course is designed for ESE graduate students having NO or WEAK programming background, with emphasis on Python, FORTRAN, Git, and Linux (BASH).

Good reasons for NOT taking the course:

  • After browsing the syllabus and schedule, there is nothing new for me
  • I have a strong programming background with Python or other tools (C, C++, Java, MATLAB, R, IDL, etc.) already
  • I am able to model, analyze, and visualize data sets through programming efforts already
  • I want to learn other programming languages (C, C++, Java, MATLAB, R, IDL, etc.)
  • I want to take an “easy” class to meet the graduation requirement
  • I don’t want to spend much time on the course; instead, I want to focus more on my research

Bad reasons for NOT taking the course:

  • I don’t have (any) coding/programming experience/background
  • Programming is intimidating; perhaps I can never master it
  • My project requires no coding/programming skills, I can finish all analysis with Excel

By taking the course, you commit yourself to work on time-consuming assignments and final project.


A closer look at programming process

Very often, we find programming difficult because we fail to break down the initial research question into small parts, which I refer as conceptual model, math or logic model, and computer model.

The most crucial step is to have a well-defined conceptual model. This involves simplifying and generalizing your initial question so that you can describe the gist in one sentence. Once you have a well-defined conceptual model, building a math or logic model would become straightforward. And coding is just the last step of the programming process where you use suitable languages and tools to let the computer know what you exactly want to do.

drawing

The above process of programming, however, can not be ideal as you can probably not make it with one shot. In reality, you need to modify those models iteratively before finally solving the question, so the chart would be more like as follow.

drawing

An example: Estimation of \(\pi\)

Now suppose our research question is:

Question: How to estimate the number of \(\pi\)?

Your ideas

Can you think about a few ways to solve this?

Hints

If r is the radius of a circle, then the surface of this circle is \(\pi*r^{2}\). Therefore, if we know the radius of a circle, and the surface of this circle, we can estimate \(\pi\). Then how can we estimate the surface of a circle?

A circle (red) in a square (black) is our solution. Indeed, this is very easy to know the surface of a square. It is the square of the length of one of its sides. Do you see where this is leading us? What if we were able to estimate the rate between the surface of the circle and the surface of the square? Then we would know the surface of the circle.

drawing

Finally, we just need to estimate (surfaceCircle/surfaceSquare). We can apply the Monte Carlo method here. If we uniformly drop N random points (blue) in the square and count how many of them are in the circle, then we have a proxy of the quantity surfaceCircle/surfaceSquare. (source: Estimation of the number PI – A Monte Carlo simulation)

The conceptual model

Following the above idea, the conceptual model would be: The proportion between the number of points within the circle (n) and the square (N) is equal to surfaceCircle/surfaceSquare.

The math model

Simply translating the conceptual model, we have \[n/N = \pi / 4\]

The computer model

We will learn how to do this with python in later sections.


In-class exercises

  • Brainstorm one research question

  • What is the conceptual model?

  • What is the math/logic model?