Go into your proj4-XX-XX folder on a hive computer (it's okay if you're ssh'd in).
Create a branch called "part2".
git checkout -b part2
Pull proj4 starter files onto that branch.
git pull proj4_starter part2
Push the branch to Github, so you can later pull "part2" into your EC2 instance.
git push origin part2
I like to think of the folder we're creating now as sort of a "base camp" from which to log into Amazon EC2.
git clone git@github.com:cs61c-spring2015/proj4_ec2.gitcd proj4_ec2 # Go inside the directory that was created
Donggyu has set up a cool script to automatically generate logins for us! These logins (when they work) seem to magically work with the make
commands to let us do stuff on EC2 from the Hive.
make account
The problem is that the account generation is, in Donggyu's own words, a little hacky. We're told to just call make clean
then make account
over and over until it doesn't error. But that's what scripts were made for! Use this neat shell script Alex Fu wrote.
#!/bin/bashi="1"while [ $i -ne 0 ] ; do make clean make account | tee testing.txt i=`grep "403 InvalidClientTokenId" testing.txt | wc -l` # cat $testingdone
Just copy and paste it into the terminal, and it'll stop when you have a working account.
Start renting a cluster on EC2.
make launch-test
If this screws up at any point, just call:
make resumeuntil it works.
Now you've started renting your much smaller cluster for the sake of testing that it all works fine.
Note 1: Watch the time, because every minute you're renting it costs money.
Note 2: You can call make master
and get a nice UI in the browser to look at information about the cluster you've rented. I left this step out because I never got it to work for me and I don't think it's required to complete the project. (Thanks Maya for pointing it out!)
Now you're ready to log in (basically ssh-ing in) to the cluster you rented.
make login
Now copy your code into the cluster so you can actually run it.
git clone https://github.com/cs61c-spring2015/proj4-XX-YY.gitcd proj4-XX-YYgit checkout part2
Now we need to get Cython and other dependencies onto our cluster machine. Run Donggyu's shell script to do this automatically.
./setup_ec2.sh
Finally, we can run some test code.
make ec2-cnn-test
Look at how long this takes - if it's too slow then you may need to change something before you actually run your code on the massive data sets.
Now that you are done, you need to kill your cluster and end the rental.
make destroy
This is just the same as step 1, except that:
make ec2-cnn-test
, you're running 3 much larger benchmarks and recording the results.As in step 1, we need to start renting our cluster from Amazon:
make launch-small
If this screws up at any point, just call:
make resumeuntil it works.
Now we go into our cluster to run the benchmarks.
make login
Now copy your code into the cluster so you can actually run it.
git clone https://github.com/cs61c-spring2015/proj4-XX-YY.gitcd proj4-XX-YYgit checkout part2
Now we need to get Cython and other dependencies onto our cluster machine. Run Donggyu's shell script to do this automatically.
./setup_ec2.sh
Now we run tests on 3 benchmarks:
make ec2-cnn-large
, make ec2-cnn-huge
, and make ec2-cnn-full
- or you can just call another shell script Donggyu wrote for us:screen./batch.sh
Donggyu's command will save all output to a log file on your cluster (not the Hive machine). The same information will pop up on your terminal so you can also just copy and paste it.
Broken Pipes: If you get a broken pipe or some other ssh-related error, don't kill your cluster, or you'll regret it like I am right now! You can still get your completed logs! You called screen
(a terminal multiplexor, explained in the original spec), so you can just go back into your Hive machine and call screen -r
and you'll be be brought back into your cluster like nothing ever happened.
How to Read From Logs: If you plan on reading your information from the logs instead of copying and pasting from the terminal, you'll need to copy the log files from your cluster back to your class account: scp ec2-cnn-full.log cs61c-XX@hiveY.cs.berkeley.edu:~/(destination folder)
. Obvously just switch out ec2-cnn-full.log
for ec2-cnn-huge.log
, or whatever log file you want to copy over. (Thanks Oliver for mentioning this process)
Either way, you will get benchmark results for each that look something like this:
cnn train-large 8000 2> ec2.log[CS61C Project 4] start classifier trainingiteration: 0, loss: X, time: X seciteration: 1, loss: X, time: X seciteration: 2, loss: X, time: X sec[CS61C Project 4] done training[CS61C Project 4] test the classifier[CS61C Project 4] accuracy: X[CS61C Project 4] training performane: X imgs / sec[CS61C Project 4] time elapsed: X min
Record them in some way, because you'll need them to write your answers in part2.txt.
Once you have your precious benchmark results, you should kill your session on the cluster (remember that every minute renting the cluster is costing the school money).
make destroy
This is almost exactly the same as step 2, except that:
As with every other step, we need to start renting our cluster from Amazon:
make launch-big
Call make resume
as needed.
From here, you know the drill:
make logingit clone https://github.com/cs61c-spring2015/proj4-XX-YY.gitcd proj4-XX-YYgit checkout part2./setup_ec2.shscreen./batch.sh
Record your results for:
Kill the session.
make destroy
I'm going to link you back to the spec for this last step: link. I hope this less confusing spec has helped you!