@SuppressWarnings("unused")
before
the method.alt-shift j
in Eclipse
puts a JavaDoc stub above the current method.Ctrl+Shift+O
(Organize imports)class Trash { /** * @param condition * @param shockSetting */ public void trial(int condition, float shockSetting) { System.out.println("Foobar"); } }You then fill in the rest to like something like:
class Trash { /** * Runs one trial of the experiment. * @param condition: The experimental conditions code * @param shockSetting: Voltage applied to the material */
mvn archetype:generate -Dfilter=quickstart \
This will generate a tree with the following path to where your code
then goes:
-DgroupId=edu.stanford.arcspread.mypackage \
-DartifactId=MyProject \
-DpackageName=mycode
MyProject/src/main/java/edu/stanford/arcspread/mypackage
Your code goes into mypackage
. The process will have put a
file called App.java in that directory.
PhotoSpread, PhotoSpread, edu.stanford.photoSpread.
The '-D' passes an argument into a Java program. Maven's
command 'mvn' is a Java program.
Let's make our groupId be ArcSpread. Your individual projects
will each have a different artifactId, which you can invent. For example:
wordBrowser. Let's have all our packages start with
edu.stanford.[yourArtifactName]
Once you issued the above command for your artifact,
you'll have a pom.xml file in the root of your new tree. That's
where any dependencies on outside libraries are recorded.
If you put code into [rootdir]src/main/java/...
, which
has been created for you, you just cd to your [rootdir]
,
and run:
mvn compile
For other actions than compile (like 'compile your test code,' 'do 'testing,'
'package your stuff into a jar file that I can run on my machine', etc., look
for the keyword 'life phases' in the Maven literature.
Some other useful Maven commands:
mvn install
mvn test-compile
mvn exec:java -Dexec.mainClass=photoSpread.AppMust be in project root dir. In place of App can be any other class name that contains a Main.
mvn site
mvn test
mvn eclipse:eclipse
/src/main/resources
http://download.eclipse.org/technology/m2e/releases
mvn help:describe -Dplugin=exec -Dfull
- Once you set up your Maven directory structure, get a Github URL
from Andreas, then inter the following on the command line in your
project root directory on your local machine (say the Github address
is git@github.com:paepcke/ArcSpreadUX.git:
git init
git add *
git commit -a -m "Initial, empty Maven repository."
git add remote origin git@github.com:paepcke/ArcSpreadUX.git
git push origin master:master
- git branch // Which branch am I on, and which branches exist?
- git checkout [branch] // Make my working tree be 'branch'
- git status // Is there anything to commit to the local repo?
- git commit -a -m "Description of what this commit changes in the
code."
// Commit all newly added files, or old but changed files to
the local repo.
// Note: in many shells you can hit return during your commit message,
//
before you close the quote. Doing this will make your commit
//
messages much more readable.
- git push origin [myBranchName]:[myBranchName]
// Push my current branch back to the remote repo,
// creating a branch of the same name there, if such a branch
// doesn't exist yet.
- git pull origin [remoteBranchName]:[remoteBranchName]
- git reset --hard HEAD // Throw away changes I made since my last commit.
- git mergetool // Invoke the default mergetool you set up
// earlier using, for example in case of using the diffuse tool:
// git config --global merge.tool diffuse
You can find your own style of working with Git, but when I develop on
my own, I feel safest just keeping a straight line of branches that I
name by the dates I worked on them. Like this. Assume it's Oct12,
2011, and I have a branch called Sep25_2011, which is my currently
checked out (i.e. active) branch. I start the day doing this:
git branch Oct12_2011
git checkout Oct12_2011
Now I change the code. When I'm done for the day, I do:
git push origin Oct12_2011:Oct12_2011
This will create a new branch in the remote repo, with the same
name as the local branch I created in the morning.
It's an extremely conservative use of Git, but it works for me. Feel
free to be more adventurous, creating parallel branches, and merging
them.
After some research I decided on diffuse
as my file diff
viewing and merging tool. To make git use diffuse for the 'git
mergetool' command (after you installed diffuse
:git config --global merge.tool diffuse
master
be the current stable branch:
- git checkout master // to get to the latest from repo
- git branch Oct12_2011
- git checkout Oct12_2011
- make changes
- git push origin HEAD:master
git status
git log --pretty=short
git branch
git remote show [depotName]
git remote show origin
git rm --cached [fileName]
git ls-remote origin
git reset --hard HEAD
git ls-files --other --exclude-standard
au = !git add $(git ls-files -o --exclude-standard)
git diff
git diff HEAD
git diff --cached
git fetch
git log HEAD..origin
// to show the log entries between your last common commit
// and the origin branch.
To show the diffs, use either
git log -p HEAD..origin to show each patch, or
git diff HEAD...origin (three dots not two) to show a single
diff.
git log --pretty=short
git show :
git show Trash:src/foo.txt
index.html
file. Current repos on Github are:
git@github.com:paepcke/ArcSpreadDevInfo.git
git@github.com:paepcke/ArcSpreadUX.git
git@github.com:paepcke/ArcSpreadMachineRoom.git
git@github.com:paepcke/PigIRAnt.git
src
): One PigScript that does
the processing, and one Bash shell script that serves as a console
command that invokes the Pig script. Each shell script provides usage
info when invoked with -h, --help, or no parameters.
Each Pig script uses the WebBase loader, or the WARC loader to pull in
Web pages. The scripts' outputs are usually files in HDFS that can be
consumed directly by the upper layers, or can be moved into SQLite.
The spreadsheet engine will invoke the shell scripts as OS calls from
Java.
ilc0
(for
info-lab-zero). You'll need an account on that machine. That's where
you do your full tests. The machine is an HDFS and a regular home
directory storage section (/home/[userName]). You put your Pig scripts
and corresponding shell scripts into the user section. Results will
show up in HDFS.
ilc0
:
hadoop dfs -ls /
hadoop dfs -ls /user/paepcke
hadoop dfs -cat /user/paepcke/foo
hadoop dfs -getmerge [partfile] [mergedFile][.gz]
The .gz extension automatically gzips. Example:hadoop dfs -getmerge /user/paepcke/foo .
alias hp='hadoop fs '
alias .ls="hadoop fs -ls"
alias .rm="hadoop fs -rm"
alias .rmr="hadoop fs -rmr"
alias .cat="hadoop fs -cat"
alias .cpfl="hadoop fs -copyFromLocal"
alias .cptl="hadoop fs -copyToLocal" alias .pull="hadoop fs -getmerge"
~/Software/Hadoop/Hadoop/hadoop-0.20.2/src/hdfs/hdfs-default.xml
http://ilc0/ganglia
(cluster CPU and memory usage)http://ilc0:50030
(Job tracker)http://ilc0:50070
(HDFS)myBag = GROUP myTuples all;
REGISTER /usr/local/pig-0.8.0-SNAPSHOT/contrib/piggybank/java/piggybank.jar;
REGISTER /home/paepcke/PigScripts/pigUtils.jar;
tweets = LOAD 'Datasets/morTweetsSmall.csv' USING
org.apache.pig.piggybank.storage.CSVLoader AS (txt:chararray,
source:chararray, dateTime:chararray);
docs = LOAD '$CRAWL_SOURCE'
USING pigir.webbase.WebBaseLoader()
AS (url:chararray,
date:chararray,
pageSize:int,
position:int,
docIDInCrawl:int,
httpHeader,
content:chararray);
The environment variable $CRAWL_SOURCE
is assumed to
hold the name of the crawl. See the Section on
interacting with WebBase via a browser to understand which name
to use.. The above command will
conceptually provide a seven-column relation, where each Web page
of the crawl is one row.
~/Software/Eclipse/eclipse/eclipse.ini
change the -Xmx
(max heap size) entry to 1GB
-Xmx1000m
-Xmx1000m
http://diglib.stanford.edu:8091/~testbed/doc2/WebBase/
. Once
there, find the paragraph on Wibbi, and click on the link there.
You'll find a page that lets you define a stream of pages from one
crawl. On the first page you specify how many pages you want, and how you
want them filtered. On the next page you'll specify which crawl you
want.
On that crawl selection page you'll see the crawl names in the first
column. That's the name the Pig WebBase loader needs to find the
crawl.
When you hit the download button in one of the rows, your browser will
ask you where you want the impending stream to be stored. The file you
specify there will hold all the pages you download.
PigIR
.
The following utilities are currently available:
edu.stanford.pigir.arcspread
. They each include a
main() method with an example.
mvn compile
),
and within Eclipse, if you import your code as a Maven project.
Then, in your code, import what you need. For example,
to use the part-of-speech tagger:import edu.stanford.pigir.arcspread.POSTagger;
For the WebBase page extraction utility:import edu.stanford.pigir.webbase.DistributorContact;
import edu.stanford.pigir.webbase.WbRecord;
import edu.stanford.pigir.webbase.wbpull.webStream.BufferedWebStreamIterator;
PigIR
utilities, then your
users will need access to the PigIR
. The easiest
way to do this is to splice this to your
pom.xml dependencies
section:
POSTagger
utility.
This will automatically download the jar file, and adjust your Java path to find entries within it.