Finding Replacing Filenames & text inside files using perl
A few “Bash Tricks” for the peeps
First up:
Find & replace filenames
This finds file that match Somethinghtml.html and renames to Something.html (HTTP Track users will recognize these two examples as fixes to common naming problems when scraping a site.) Note you can change the mask of files searched by changing the find mask at the end of the line.
<code>
perl -p -i -e ’s/(.*)html\.html/\1.html/g;’ `find ./ -name ‘*.html’`
</code>
Then Find & Replace text inside of files:
This finds things that have been incorrectly prefixed with dev.mistcat.com and removes that prefix so that this 2112.js file can load from a remote domain correctly. Note you can change the find mask at the beginning to restrict or un-restrict your searched files for text replacement.
<code>
find . -name ‘*.html’ | perl -pi -e ’s/http:\/\/dev\.mistcat\.com\/(www\.dwin1\.com\/2112\.js)/http:\/\/\
1/g;’
</code>
Ubuntu Apache Tuning
So I recently needed to do a little quick performance tuning at work on one of our ubuntu server installs. I’ve been using some of the wisdom found here to refresh myself on the basics, and I wrote a quick little one line awk script to give me the total amount of memory being used by apache at any given time, and the average process size for apache at that moment as well. I figured someone else might get some use out of it and decided to post it: (the following should all be on one line, but hey)
<code>
ps -ylC apache2 –sort:rss | awk ‘{x += $8;y += 1} END {print "Apache Memory Usage (MB): "x/1024; pr
int "Average Process Size (MB): "x/(y*1024)}’
</code>
So the command ‘PS’ is going to give us process information for any process containing the apache2 text, I’m going to sort it by the physical/resident memory that process is taking up. I then feed that data into Awk and on Ubuntu, the 8th column in is the memory info. I then total that up in my x variable and total the number of processes in my y variable and then print those out nicely out to the terminal. (Incidentally you could use this in a cron job to create a sort of very ghetto/basic apache historical tracking if you wanted to see apache memory usage over time)
So as I understand it, your MaxClients setting in apache should follow this formula:
MaxClients ≈ (RAM – size_all_other_processes)/(size_apache_process)
So with my handy script you now have the size_apache_process variable, you probably know your boxes total ram, (‘free –m’ will tell you if you don’t know) and I’ve been estimating ‘size_all_other_processes’ as about 20% of total ram. It occurs to me that I could probably write a script to total up the size of all other processes as well, but I’ll have to investigate that a little bit… Anyhow it’s pretty quick to figure out a good MaxClients setting using this script!
It must be summertime…
Haines Point, Sat afternoon:
Some inspiration for your personal projects
I’ve always been a fan of Ze Frank, his 1 year video blogging project was funny, insightful, and he’s always seem to have a great outlook on life and a creativity spilling out of him at every step. I recently got a chance to watch his talk at Webstock 2009, and I realized he’s not just fun and creative, but wise and inspiring to boot. I hope you like it too. =)
Update: Weak apparently Vimeo wont’ let me repost the video directly… Check it out here.




