Monthly Archives: January 2011

UNIX Useful Commands

In the course of trouble-shooting or performance tuning, it is inevitable to go down to the OS level. Typically enterprise Java applications would be hosted on UNIX, Solaris, Linux or AIX. Here are some commands that I’d used or seen system engineers using. I’ll add to the list whenever I remember something.

As there’re many variants of UNIX, it is best to consult your man pages for these commands.

To find out the top processes in terms of CPU/memory usage:

top

If you discover that your Java application is hogging the CPU or memory, sometimes you might have to kill the process. The top command gives you the PID of the process. The results of the top command are dynamically refreshed.

To kill a process by its PID:

kill <pid>

If the process is persistent and showed up again in the top command, try:

kill -9 <pid>

Sometimes you would want to have a core dump to analyze what went wrong:

kill -3 <pid>

To do a directory listing (with files details), similar to your Windows’ dir command, do this:

ls -l

Sometimes you want to see hidden files:

ls -la

Changing directory is similar to Windows:

cd /<target directory>

If you want to read your running log files, especially the latest logs:

tail -f /Applications/jboss/logs/superapp.log

There’s the head command too! Say your log files are capped at 5 MB, and it will roll over to a new file once the limit is reached, you might want to use the head command to look at the first few lines of this new log file:

head -f /Applications/jboss/logs/superapp.log

To display contents of (non-binary) small files, we could:

cat /<directory>/<filename>

For large files:

more /<directory>/<filename>


UNIX How-to: house-keeping script

In an earlier post, I wrote about not using Java for performing house-keeping tasks such as moving, copying or deleting files on the OS (be it UNIX or Windows). In this post, I’m going to reproduce what I did a couple of years ago. The UNIX script/commands for performing these tasks.

The first step in solving these problems would be to find all the target files. Once we find them, we could then perform the necessary remedial actions. We could find the files by their last access (read) or last modified (contents) changed date using the find command. The find command is recursive, so it looks into subdirectory too.

Search for the list of *.log files that were last accessed more than 90 (3 months) days ago:

find /<directory> -iname “*.log”  -atime +90 >> lastaccessedlogsolderthan90days.txt

Noticed that I actually created a text file to hold the list of such files. I find this useful for a couple of reasons:

  • I could iterate through the file contents and perform the necessary actions.
  • It tells me what are the files that I’m going to act on, so I can verify before doing anything irreversible.

Search for the list of *.log files that were last modified more than 90 (3 months) days ago:

find /<directory> -iname “*.log”  -mtime +90 >> lastmodifiedlogsolderthan90days.txt

Note that you should do a man find to check out your options as we’re probably using different versions of UNIX. These commands are tested on MAC OS Tiger.

If you’re searching for files that are last accessed or modified within n days, you should be using these:

-atime -90 or -mtime -90

If it’s meant to be an exact search, then:

-atime 90 or -mtime 90

You could easily modify the command to include the move/copy/delete action. If you’re sure that it is perfectly safe to do so, you could simply do this:

find /<directory> -iname “*.log” -mtime -90 -exec rm {} \;

You could do a copy like this, but your original files are still intact.

find /<directory> -iname “*.log” -mtime -90 -exec cp {} /<target directory> \;

Sometimes you want to move the files to an archive folder for a further period of time before purging, or so that a tape backup can run and it picks up these files at a specific directory.

find /<directory> -iname “*.log” -mtime -90 -exec mv {} /<target directory> \;

If you wish to have visibility of what you copy/move/delete, it is better to find and pipe the results to a text file, then parse the contents, for each row, perform the action and pipe the file name to another results file, then you would know what’s being moved/copied/deleted.


How-to: write Oracle stored procedure & function

Both stored procedure & function could be thought of as a sequence of SQL statements to be executed. Typically, a stored procedure performs a task whereas a function computes and return a value.

Typically, tools such as TOAD or Oracle SQL Developer would have UI/wizards to assist in the creation. The syntax are pretty similar in both.

Syntax for creating stored procedure:

CREATE [OR REPLACE] PROCEDURE procedure_name
[ (parameter [,parameter]) ]
IS
[declaration_section]
BEGIN
executable_section
[EXCEPTION
exception_section]
END [procedure_name];

An example would be:

CREATE OR REPLACE PROCEDURE myprocin(x VARCHAR) IS
BEGIN
INSERT INTO oracle_table VALUES(x);
END;

Syntax for creating stored function:

CREATE [OR REPLACE] FUNCTION function_name
[ (parameter [,parameter]) ]
RETURN return_datatype
IS | AS
[declaration_section]
BEGIN
executable_section
[EXCEPTION
exception_section]
END [function_name];

An example would be:

CREATE OR REPLACE FUNCTION modulo(x IN NUMBER, y IN NUMBER)
RETURN NUMBER
IS
BEGIN
RETURN x%y;
END;

Serialization and deserialization

Have you ever used Java serialization & de-serialization at work? I meant other than implementing Serializable interface. Serialization is a process of “flattening” an object so that it could be sent across the network. De-serialization is the reverse. By “flattening” it is cheaper and faster to send across the network.

I’d seen a scenario whereby serialization & de-serialization could be used to simplify the system design, but was not done so. I will simplify the scenario as a financial institution (FI) seeking information about a person from a credit bureau (CB). CB provides a service to FI to check up on a person’s credit worthiness. This report could contain 0 or many reports (PDF format). Each request from FI could be for 1 or multiple persons.

So what happen was that in the original design, the output from the CB contained an XML file and all the reports. Each PDF report is named by the person’s unique ID, suffix with a unique serial number. The XML file looks something like this:

  <persons>
    <person>
      <id>ABC1234567N</id>
      <hit>true</hit>
      <reports>3</reports>
    </person>
    <person>
      <id>ABC1222567N</id>
      <hit>false</hit>
      <reports>0</reports>
    </person>
</persons>

There’re 2 persons that the FI wishes to check up on. The first person has a hit, and he’s 3 reports related to his credit worthiness. The second person is clean. CB send FI this XML file together with 3 PDF reports. The FI needs to read the XML and tally against the total number of PDF reports sent along. If the sum doesn’t tally, it will request the CB to resend.

I felt that serialization could have a place here and would simplify things further.

public class Person implement Serializable {
  String id;
  List reports;
}

public class Report implement Serializable {
  // the PDF report is stored in memory as a byte[]
  String filename;
  byte[] content;
}

All we need to do is to serialize the Person object into a file. The filename is uniquely identified by the person’s unique ID. This filename is then stored in the XML file sent by CB. FI just need to verify that each entry in the XML is matched with a serialized file.

At FI’s end, de-serialization takes place. The Person object is ready for use! We can remove the codes that associate each PDF report to a person. There’re lesser files to be bundled and sent over the network too. Once FI has person object in memory, it is trivial to use ORM to do a persist to DB.

There’re still gotchas, like what if CB missed a report, or CB gave the wrong report. For missing reports, checks could still be done, but wrong report in terms of contents is likely to go undetected until read by a human.


Don’t use Java in replacement of native file I/O commands

Actually, you can replace Java with your favorite programming languages such as C or C#.

I was given a task to look after some house-keeping jobs. We realized that the jobs are taking very long to complete and in some occasions, it exceeded the allocated time frame and had to be aborted. Our disk-space quota were getting exhausted quickly. The house-keeping jobs are pretty simple & straight forward. They look into various directories and delete outdated log and temporary files. Some jobs move output files to another directory so that a backup could be done.

Our initial suspicions were that as the system grew, our output were getting larger and hence it took more time to complete the jobs. This was correct after inspecting the log files sizes and quantity. As the system grew in complexity, developers added more logs to it. More output files were also generated.

On our preliminary discussions, we decided to cut down on unnecessary logging. We saw two advantages from this: reduce in log files quantity and size; improvement in system performance as logging requires file I/O and thats expensive. After doing a few rounds of reduction, we hit a plateau. Our house-keeping jobs still failed occasionally.

After more discussions, I looked into the house-keeping jobs codes to see if they could be optimized. Somehow it dawned upon me that why were we using Java for file I/O when we could just use the OS commands for it?! So I picked a job and wrote a shell script for it. The script looked for files older than 90 days and would delete them. The old job was paused. A new job was added to the CRON and tested. It worked fine. I then move on to rewrite jobs for the rest of the house-keeping.

I learnt a couple of lessons from this. Java (or whatever programming languages you’re good at) isn’t the silver bullet for every single situation. Use the right tool for the right problem. File I/O is definitely faster if we could use the native OS commands. In actual fact, we had lesser codes to maintain since the house-keeping jobs became shell scripts. My guess for such an initial approach was: lack of team discussion to come up with the appropriate solution; developer too comfortable with Java and had forgotten that there’s always the OS commands; or maybe unfamiliarity with OS commands.


Spam/con mail

Frankly, this is one of the worst spam/con mail I ever came across in my inbox. Just what is the sender trying to prove? Or was this meant as a joke?


Favoring split method Over StringTokenizer

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new codes. The recommended replacement is to use the split method of String or the java.util.regex package instead. If possible, legacy codes using the StringTokenizer class should be refactored.

Example:
We’ve a method to convert a String represented time value (e.g., 1.25 hours) that means to be reformatted into a new format (01:15).

public String getTimeFormattedString(String time) {
  String timeDisplay = null;
  if (time.indexOf(".") != -1) {
    String hour = "";
    String minutes = "";
    String min = "";
    StringTokenizer str = new StringTokenizer(time, ".");
    while (str.hasMoreTokens()) {
      hour = str.nextToken();
      minutes = str.nextToken();
    }
    minutes = "0." + minutes;
    int mts = (int) Math.round(Double.parseDouble(minutes) * 60);
    if (hour.length() < 2) {
      hour = "0" + hour;
    }
    if (new Integer(mts).toString().length() < 2) {
      min = "0" + new Integer(mts).toString();
    } else {
      min = "" + mts;
    }
    timeDisplay = hour + ":" = min;
  } else {
    if (time.length() < 2) {
      time = "0" + time;
    }
    timeDisplay = time + ":00";
  }
  return timeDisplay;
}

To refactor by using the split method of String:

public String getTimeFormattedString(String time) {
  String timeDisplay = null;
  if (time.indexOf(".") != -1) {
    String[] duration = time.split("\\.");
    String hrs = (duration[0].length() == 1 ? "0" + duration[0] : duration[0]);
    int minutes = (int) Math.round(Double.parseDouble("0." + duration[1]) * 60);
    String mins = (minutes < 10 ? "0" + minutes : "" + minutes);
    timeDisplay = hrs + ":" + mins;
  } else {
    timeDisplay = time + ":00";
  }
  return timeDisplay;
}

The refactored method is significantly shorter, easier to understand & maintained. Of course, with proper database design, the time component should have been represented as a DATE or TIMESTAMP rather than with a VARCHAR.