Revision of README-ANRLs-LWP from Thu, 08/08/2013 - 09:31

ReadMe: ANRL’s Local Webpage

Updated: Aug 08, 2013

Basic Stuff

  • Can you tell that ANRL’s Local Webpage was written in very simple HTML? It’s not a snappy pretty solution to getting thie content onto a Webpage but it works. If I had lots of time when I was making this Webpage I could have learned how to make a pretty Webpage with HTML5, php, CSS, JavaSAcript, AJAX and others, however, I focused in displaying content using basically simple stuff. One day the whole Webpage will be re-written.
  • This program uses the old HTML frames construct. This is oblsolete… and I’m wanting to replace it with a modern method. Suggestions?
  • The Webpage is written by a perl program which I wrote. The whole Webpage can be re-built in a few seconds.
  • I use quite a bit of basic shell scripting in the process of building the page.
  • Of course, the whole process could run on any platform. I’m a Mac. If anyone wanted to convert this to PC, then the re-write would need to re-work the parts that use shell scripting.
  • The content of the Webpage is data driven in that I look at folders and then display content of the folders.
  • Processing folders to get them in shape for the data driven process is another matter. I have a number of programs that do that, and that will be added to this ReadMe file sometime in the future.

Screen Layout

  1. Lower Left Frame Primary Navigation Frame. Clicking on one of the choices will either open up the Upper Left Frame’s choices or the choices in the Main Frame.
  2. Upper Left Frame Secondary Navigatioin Frame. Clicking on one of the choices will display content in the Main Frame. Sometimes you will see a “Back” button and if it’s there, clicking on that will return to a menu in the Main window.
  3. Top Frame The Title frame
  4. Main Frame The large frame in lower right which is basically used to display all content.

Data Pre-Processing

  • Elimination of duplicate content Many image files were duplicate but had different names. Many files with the same name were not the same file. A good number of perl scripts were used to boil down the database to acheve current status of
  • No Duplicate File Names
  • No Duplicate Content

Data Structures

  • for each folder in the database with Image Files or PDF files.
  • Two new folders are created
  • Image_Files for image files
  • PDF_Files for PDF files
  • Original images are moved to the Image_Files folder
  • Original PDF files are moved to the PDF_Files folder
  • The rational for this is to keep track of image files that need to go into PDF files and also allow for re-building the PDF files if images are rotated, cropped, etc.

Programs used for building web site

  1. Dir_Inspector.pl is used to do most of the work of building the Image_Files and PDF_Files folders and moving data around.
  2. ANRL_Make_PDF_Web.pl writes a shell script to create PDF files with contents of each Image_Files folder.
  3. The shell script does this:
  4. copies contents of Image_Files to a temp directory
  5. runs an Automator file which creates a new PDF file
  6. Re-Names the PDF FIle to “original_folder_name_PDF.pdf” If you look in the PDF_Files folders you will see files with the “_PDF.pdf” last part in their name. Noting this, you can tell which PDF files were created with my programs.
  7. Copies the created PDF file to it’s proper location
  8. Removes all files form the temp folder.
  9. Note A good number of image files need to be cropped, rotated, straightened, brightened and so on. It’s possible to remove the PDF File, the one with the “something”_PDF.pdf name, and re-run Dir_Inspecdtor. re-run the script which runs Automator and get a new PDF file.
  10. AANR_WEB.pl is the main program that builds pages for audio files and displays the contents of THE_BIG_ANRL
  11. Test_4_OCR is a program I wrote to test for the OCR status of files. It never was that successful in that I never perfected the feedback between this program and the main web building programs. The algorithm I used simply did a grep on the file with ’font’i (the text string ‘font’ case insensitive.) Seems simple but takes awhile to run.
  12. When viewing a PDF file, if you drag the cursor across text and then if the text is highlighted, then the file has gone through OCR. If you drag the cursor and nothing happens, then either the file has not gone through the OCR process or the part of the text file being examined didn’t look like text to the OCER algorithm or the file wasn’t processed by OCR.
  13. Gemini is a MAC program for finding and deleting duplicate files
  14. ABBYY fine reader express is available for use to do OCR on individual files. The MAC version has no batch capability. We will try to use our main ABBYY program to do bulk OCR on the hundreds of PDF files from scanned documents.
  15. DropSync is a MAC program that synchronizes files between volumes. When I’m working on changes to the web site, somethings thousands of files change. I use DropSync to copy off the master copy to other volumes for backup and for viewing at the Library:
  16. Master_1TB-external to Drobo_5D
  17. Drobo_5D to iMAC 27"

OCR

None of the created PDF files have gone through the OCR process. One day we’ll do that.

Tips

  • You will be viewing this Webpage with a browser so all the browser tricks you know should work, like:
  • back, forward, reload, save as, view source and so on.
  • Older browsers may not be able to play the .mp3 files with copies of cassette tape recordings. You could try using a different browser of a new version of your current browser.
  • You can zoom the screen if necessary
  • You should be able to re-size the frames. For example, if you minimize the Top Window, there will be more viewing area in the Main Window. To re-size, simply click and drag the edge of a window.

Call for Help

  • If you know of a better way of displaying data like this Webpage does, by all means contact me and explain. I need ideas and I need help.

Contact Me

Bob Proctor bobproct@me.com 407–552–1543