Newton Institute Seminar : van Houwelingen, JC, 17/06/2008: "Global testing of association and/or predictability in regression problems with p>>n predictors"
'via Blog this'
Friday, November 30, 2012
Logistic
Logistic: "
In order to find the matrix B for which L is minimised, a Quasi-Newton Method is used to search for the optimized values of the m*(k-1) variables. Note that before we use the optimization procedure, we 'squeeze' the matrix B into a m*(k-1) vector. For details of the optimization procedure, please check weka.core.Optimization class.
Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights.
For more information see:
le Cessie, S., van Houwelingen, J.C. (1992). Ridge Estimators in Logistic Regression. Applied Statistics. 41(1):191-201."
'via Blog this'
In order to find the matrix B for which L is minimised, a Quasi-Newton Method is used to search for the optimized values of the m*(k-1) variables. Note that before we use the optimization procedure, we 'squeeze' the matrix B into a m*(k-1) vector. For details of the optimization procedure, please check weka.core.Optimization class.
Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights.
For more information see:
le Cessie, S., van Houwelingen, J.C. (1992). Ridge Estimators in Logistic Regression. Applied Statistics. 41(1):191-201."
'via Blog this'
Logistic Regression
Logistic Regression: "Logistic Regression (SGD)
Logistic regression is a model used for prediction of the probability of occurrence of an event. It makes use of several predictor variables that may be either numerical or categories.
Logistic regression is the standard industry workhorse that underlies many production fraud detection and advertising quality and targeting products. The Mahout implementation uses Stochastic Gradient Descent (SGD) to all large training sets to be used.
For a more detailed analysis of the approach, have a look at the thesis of Paul Komarek:
http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en
See MAHOUT-228 for the main JIRA issue for SGD.
"
'via Blog this'
Logistic regression is a model used for prediction of the probability of occurrence of an event. It makes use of several predictor variables that may be either numerical or categories.
Logistic regression is the standard industry workhorse that underlies many production fraud detection and advertising quality and targeting products. The Mahout implementation uses Stochastic Gradient Descent (SGD) to all large training sets to be used.
For a more detailed analysis of the approach, have a look at the thesis of Paul Komarek:
http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en
See MAHOUT-228 for the main JIRA issue for SGD.
"
'via Blog this'
Logistic
Logistic: "Class for building and using a multinomial logistic regression model with a ridge estimator.
There are some modifications, however, compared to the paper of leCessie and van Houwelingen(1992):
If there are k classes for n instances with m attributes, the parameter matrix B to be calculated will be an m*(k-1) matrix.
"
'via Blog this'
There are some modifications, however, compared to the paper of leCessie and van Houwelingen(1992):
If there are k classes for n instances with m attributes, the parameter matrix B to be calculated will be an m*(k-1) matrix.
"
'via Blog this'
WEKA - Convert from arff to csv from command line?
WEKA - Convert from arff to csv from command line?: " weka.core.converters.CSVSaver -i -o "
'via Blog this'
java -Xmx1500m -classpath /usr/share/java/weka.jar weka.core.converters.CSVSaver -i test.arff -o test.csv
'via Blog this'
java -Xmx1500m -classpath /usr/share/java/weka.jar weka.core.converters.CSVSaver -i test.arff -o test.csv
Getting Started
Getting Started: "/* local mode */
$ pig -x local ...
/* mapreduce mode */
$ pig ...
or
$ pig -x mapreduce ..."
'via Blog this'
$ pig -x local ...
/* mapreduce mode */
$ pig ...
or
$ pig -x mapreduce ..."
'via Blog this'
Thursday, November 29, 2012
Performing Data Science with HBase: Strata Conference + Hadoop World - O'Reilly Conferences, October 23 - 25, 2012, New York, NY
Performing Data Science with HBase: Strata Conference + Hadoop World - O'Reilly Conferences, October 23 - 25, 2012, New York, NY: "Regardless, large amounts of data – especially data about users intended for use in an online system such as an e-commerce site, gaming platform, or ad network – is stored in HBase, and data scientists must be able to perform investigative analysis on this information to better understand their business and improve these online processes. And the read/write model of HBase offers advantages over HDFS to the data scientist building complex analysis pipelines."
'via Blog this'
'via Blog this'
Software Engineer, Data Infrastructure Engineering | Facebook Careers
Software Engineer, Data Infrastructure Engineering | Facebook Careers: "Facebook is seeking a Software Engineer to join the Data team. The ideal candidate will dream about distributed systems for the parallel processing of massive quantities of data, be familiar with Hadoop/Pig/HBase and MapReduce/Sawzall/Bigtable, and frequently think to themselves, 'Yeah, that works for 500 MB of data; what about 500 TB?' This position is full-time and based in our New York office."
'via Blog this'
'via Blog this'
NetInfo Manager - Wikipedia, the free encyclopedia
NetInfo Manager - Wikipedia, the free encyclopedia: "Methods for editing users attributes on Mac OS X Leopard (user shell, uid, primary gid, home directory path)
command line: dscl (Panther, Tiger, Leopard)
System Preferences:Accounts Pane – unlock the accounts pane – right-click/control-click on a user account – pop-up menu "advanced" – this panel will let you edit user attributes.
Note: you may need to reboot after changing this sort of information, or run 'dscacheutil -flushcache' from the command line."
'via Blog this'
command line: dscl (Panther, Tiger, Leopard)
System Preferences:Accounts Pane – unlock the accounts pane – right-click/control-click on a user account – pop-up menu "advanced" – this panel will let you edit user attributes.
Note: you may need to reboot after changing this sort of information, or run 'dscacheutil -flushcache' from the command line."
'via Blog this'
Wednesday, November 28, 2012
Why is Mahout necessary? | LinkedIn
Why is Mahout necessary? | LinkedIn: "Vishwakarma S. • We can understand the value of Mahout by following these two approaches of machine learning. One approach would be to collect, clean, and then use all the data to learn a model using an algorithm in Mahout. This approach does not yield a good result because real data is always dirty ( noise, skewed, missing values, error, correlated, etc.). Generally, ML is a two step process : Data Preprocessing and Model Learning. "
'via Blog this'
'via Blog this'
Why is Mahout necessary? | LinkedIn
Why is Mahout necessary? | LinkedIn: "Raphael C. • The 'Mining Massive Data Sets' course at Stanford is also pretty good.
"
'via Blog this'
"
'via Blog this'
Jingle Bells - Wikipedia, the free encyclopedia
Jingle Bells - Wikipedia, the free encyclopedia: "Now the ground is white
Go it while you're young,
Take the girls tonight
and sing this sleighing song;"
'via Blog this'
Go it while you're young,
Take the girls tonight
and sing this sleighing song;"
'via Blog this'
How to record audio in Chrome with native HTML5 APIs
How to record audio in Chrome with native HTML5 APIs: "This happened right in the middle of our efforts to build the Dubjoy Editor, a browser-based, easy to use tool for translating (dubbing) online videos. Relying on Flash for audio recording was our first choice, but when confronted with this devastating issue, we started looking into other options. Using native HTML5 APIs seemed like a viable solution.
"
'via Blog this'
"
'via Blog this'
Map/Reduce Tutorial
Map/Reduce Tutorial: "Although the Hadoop framework is implemented in JavaTM, Map/Reduce applications need not be written in Java.
Hadoop Streaming is a utility which allows users to create and run jobs with any executables (e.g. shell utilities) as the mapper and/or the reducer.
Hadoop Pipes is a SWIG- compatible C++ API to implement Map/Reduce applications (non JNITM based)."
'via Blog this'
Hadoop Streaming is a utility which allows users to create and run jobs with any executables (e.g. shell utilities) as the mapper and/or the reducer.
Hadoop Pipes is a SWIG- compatible C++ API to implement Map/Reduce applications (non JNITM based)."
'via Blog this'
Writing An Hadoop MapReduce Program In Python @ Michael G. Noll
Writing An Hadoop MapReduce Program In Python @ Michael G. Noll: "Precisely, we compute the sum of a word’s occurrences, e.g. (“foo”, 4), only if by chance the same word (“foo”) appears multiple times in succession. In the majority of cases, however, we let the Hadoop group the (key, value) pairs between the Map and the Reduce step because Hadoop is more efficient in this regard than our simple Python scripts."
'via Blog this'
'via Blog this'
Tuesday, November 27, 2012
[SOLVED] xhost Remote X apps. Can't get localhost X to work - Ubuntu Forums
[SOLVED] xhost Remote X apps. Can't get localhost X to work - Ubuntu Forums: "Code:
sudo cp /etc/X11/xinit/xserverrc /etc/X11/xinit/xserverrc.orig
Code:
sudo cp /etc/kde4/kdm/kdmrc /etc/kde4/kdm/kdmrc.orig
Code:
sudo cp /etc/gdm/gdm.conf /etc/gdm/gdm.conf.orig
"
'via Blog this'
sudo cp /etc/X11/xinit/xserverrc /etc/X11/xinit/xserverrc.orig
Code:
sudo cp /etc/kde4/kdm/kdmrc /etc/kde4/kdm/kdmrc.orig
Code:
sudo cp /etc/gdm/gdm.conf /etc/gdm/gdm.conf.orig
"
'via Blog this'
Getting Started
Getting Started: "This example shows how to run Pig in local and mapreduce mode using the java command.
/* local mode */
$ java -cp pig.jar org.apache.pig.Main -x local ...
/* mapreduce mode */
$ java -cp pig.jar org.apache.pig.Main ...
or
$ java -cp pig.jar org.apache.pig.Main -x mapreduce ..."
'via Blog this'
/* local mode */
$ java -cp pig.jar org.apache.pig.Main -x local ...
/* mapreduce mode */
$ java -cp pig.jar org.apache.pig.Main ...
or
$ java -cp pig.jar org.apache.pig.Main -x mapreduce ..."
'via Blog this'
BuildingMahout
BuildingMahout: "Working With Maven in Eclipse
We've used Eclipse Galileo and m2eclipse 0.9 and the 'import maven projects' feature. Check out the mahout sources into your workspace directory, do a full build on the command-line and then fire up the import in Eclipse from File > Import > Maven Projects. Point it at the mahout root directory. You are then given the opportunity to choose which sub-modules to import. You don't need to import them all, only the projects you are interested in working with.
"
'via Blog this'
We've used Eclipse Galileo and m2eclipse 0.9 and the 'import maven projects' feature. Check out the mahout sources into your workspace directory, do a full build on the command-line and then fire up the import in Eclipse from File > Import > Maven Projects. Point it at the mahout root directory. You are then given the opportunity to choose which sub-modules to import. You don't need to import them all, only the projects you are interested in working with.
"
'via Blog this'
Trello - Wikipedia, the free encyclopedia
Trello - Wikipedia, the free encyclopedia: "Trello is a Web based project management application from Fog Creek Software that can also be synced in real time with a Smartphone app. It was released at a TechCrunch event by software developer Joel Spolsky.[1] Wired magazine named the application in September, 2011 as one of "The 7 Coolest Startups You Haven’t Heard of Yet".[2] Lifehacker said it "makes project collaboration simple and kind of enjoyable".[3] In July, 2012, the site surpassed 500,000 users.[4]"
'via Blog this'
'via Blog this'
SequenceFile - Hadoop Wiki
SequenceFile - Hadoop Wiki: "SequenceFile is a flat file consisting of binary key/value pairs. It is extensively used in MapReduce as input/output formats. It is also worth noting that, internally, the temporary outputs of maps are stored using SequenceFile.
The SequenceFile provides a Writer, Reader and Sorter classes for writing, reading and sorting respectively.
"
'via Blog this'
The SequenceFile provides a Writer, Reader and Sorter classes for writing, reading and sorting respectively.
"
'via Blog this'
File Format Integrations
File Format Integrations: "Importer 'bin/mahout' jobs
Run these with --help to see options
bin/mahout arff.vector
bin/mahout lucene.vector
bin/mahout seqdirectory
turns text files into sequence files, one file per key/value pair
bin/mahout SequenceFilesFromMailArchives
parses mailboxes and emits one text body per mail message
bin/mahout regexconverter
reads text lines and emits the regex output lines into SequenceFiles.
"
'via Blog this'
Run these with --help to see options
bin/mahout arff.vector
bin/mahout lucene.vector
bin/mahout seqdirectory
turns text files into sequence files, one file per key/value pair
bin/mahout SequenceFilesFromMailArchives
parses mailboxes and emits one text body per mail message
bin/mahout regexconverter
reads text lines and emits the regex output lines into SequenceFiles.
"
'via Blog this'
Hadoop Tutorial - YDN
Hadoop Tutorial - YDN: "Introduction
Hadoop is an open source implementation of the MapReduce platform and distributed file system, written in Java. This module explains the basics of how to begin using Hadoop to experiment and learn from the rest of this tutorial. It covers setting up the platform and connecting other tools to use it.
"
'via Blog this'
Hadoop is an open source implementation of the MapReduce platform and distributed file system, written in Java. This module explains the basics of how to begin using Hadoop to experiment and learn from the rest of this tutorial. It covers setting up the platform and connecting other tools to use it.
"
'via Blog this'
hadoop - Converting CSV to SequenceFile - Stack Overflow
hadoop - Converting CSV to SequenceFile - Stack Overflow: "seqdirectory command takes every file as a document, so in reality, you only have one document, hence you only get one vector. To make it work properly you would make each line of your CSV file a file itself, where the key of the document is the name of the file and the value are its content. Nonetheless, this is quite unpractical if your corpus is large as disk reading and writing can become painfully slow.
In practice you are better off following the links I share in this comment"
'via Blog this'
In practice you are better off following the links I share in this comment"
'via Blog this'
DailyJS: A JavaScript Blog
DailyJS: A JavaScript Blog: "When I originally wrote about prototypes in JS101: Prototypes a few people were confused that I didn’t mention the __proto__ property. One reason I didn’t mention it is I was sticking to standard ECMAScript for the most part, using the Annotated ECMAScript 5.1 site as a reference. It’s actually hard to talk about prototypes without referring to __proto__, though, because it serves a very specific and useful purpose."
'via Blog this'
'via Blog this'
Improving
Improving <canvas> performance – never underestimate copy and paste | Adventures (in code): "Still not good enough
It was an improvement, but I was still looking at around 12fps in Chrome- not bad, but not ideal. With the getImageData() parts optimised as best I could manage, I looked at the next big drain- the initial drawing of blurred data points. Problem #2: what can I do that is any simpler than drawing a circle? Surprise #2: I can just copy and paste the same circle over and over."
'via Blog this'
It was an improvement, but I was still looking at around 12fps in Chrome- not bad, but not ideal. With the getImageData() parts optimised as best I could manage, I looked at the next big drain- the initial drawing of blurred data points. Problem #2: what can I do that is any simpler than drawing a circle? Surprise #2: I can just copy and paste the same circle over and over."
'via Blog this'
Getting serious about SVG
Getting serious about SVG: "iD depends on SVG for drawing map features, displaying tiles, and a model upon which to build complex interactions.
In the process of building it, we’ve learned a lot about SVG’s performance equation - and it’s time to share some of this. A lot of this is taken from NOTES.md, a sort of developer-journal which has grown over the last few weeks."
'via Blog this'
In the process of building it, we’ve learned a lot about SVG’s performance equation - and it’s time to share some of this. A lot of this is taken from NOTES.md, a sort of developer-journal which has grown over the last few weeks."
'via Blog this'
Exclusive: Inside Google Spanner, the Largest Single Database on Earth | Wired Enterprise | Wired.com
Exclusive: Inside Google Spanner, the Largest Single Database on Earth | Wired Enterprise | Wired.com: "VC is Google shorthand for video conference. Looking up at the screen on his desk, Fikes could see Wilson Hsieh sitting inside a Google office in Manhattan, and Hsieh could see him. They also ran VC links to a Google office in Kirkland, Washington, near Seattle. Their engineering team spanned three offices in three different parts of the country, but everyone could still chat and brainstorm and troubleshoot without a moment’s delay, and this is how Google built Spanner."
'via Blog this'
'via Blog this'
Jank Busting for Better Rendering Performance - HTML5 Rocks
Jank Busting for Better Rendering Performance - HTML5 Rocks: "INTRODUCING V-SYNC
PC gamers might be familiar with this term, but it's uncommon on the web: what is v-sync?
Consider your phone's display: it refreshes on a regular interval, usually (but not always!) about 60 times a second. V-sync (or vertical synchronization) refers to the practice of generating new frames only between screen refreshes. You might think of this like a race condition between the process that writes data into the screen buffer and the operating system reading that data to put it on the display. We want the buffered frame contents to change in between these refreshes, not during them; otherwise the monitor will display half of one frame and half of another, leading to "tearing"."
'via Blog this'
PC gamers might be familiar with this term, but it's uncommon on the web: what is v-sync?
Consider your phone's display: it refreshes on a regular interval, usually (but not always!) about 60 times a second. V-sync (or vertical synchronization) refers to the practice of generating new frames only between screen refreshes. You might think of this like a race condition between the process that writes data into the screen buffer and the operating system reading that data to put it on the display. We want the buffered frame contents to change in between these refreshes, not during them; otherwise the monitor will display half of one frame and half of another, leading to "tearing"."
'via Blog this'
Monday, November 26, 2012
Large Scale Machine Learning and Other Animals: Mahout - SVD matrix factorization - formatting input matrix
Large Scale Machine Learning and Other Animals: Mahout - SVD matrix factorization - formatting input matrix: "Converting Input Format into Mahout's SVD Distributed Matrix Factorization Solver
Purpose
The code below, converts a matrix from csv format:
,,\n
Into Mahout's SVD solver format.
"
'via Blog this'
Purpose
The code below, converts a matrix from csv format:
,,\n
Into Mahout's SVD solver format.
"
'via Blog this'
Saturday, November 24, 2012
SSTable and Log Structured Storage: LevelDB - igvita.com
SSTable and Log Structured Storage: LevelDB - igvita.com: "If Protocol Buffers is the lingua franca of individual data record at Google, then the Sorted String Table (SSTable) is one of the most popular outputs for storing, processing, and exchanging datasets. As the name itself implies, an SSTable is a simple abstraction to efficiently store large numbers of key-value pairs while optimizing for high throughput, sequential read/write workloads."
'via Blog this'
'via Blog this'
SequenceFile (Apache Hadoop Main 2.0.2-alpha API)
SequenceFile (Apache Hadoop Main 2.0.2-alpha API): "SequenceFiles are flat files consisting of binary key/value pairs.
SequenceFile provides Writer, Reader and SequenceFile.Sorter classes for writing, reading and sorting respectively."
'via Blog this'
SequenceFile provides Writer, Reader and SequenceFile.Sorter classes for writing, reading and sorting respectively."
'via Blog this'
Hadoop Tutorial Series, Issue #1: Setting Up Your MapReduce Learning Playground | My Blog by Philippe Adjiman
Hadoop Tutorial Series, Issue #1: Setting Up Your MapReduce Learning Playground | My Blog by Philippe Adjiman: "export M2_HOME=/usr/local/apache-maven
export M2=$M2_HOME/bin
export PATH=$M2:$PATH
Then from the same terminal go into the workspace directory (usually located at ~/workspace) and create a java project hierarchy using the following maven command (change the groupId and the artifactId as you like):
"
'via Blog this'
export M2=$M2_HOME/bin
export PATH=$M2:$PATH
Then from the same terminal go into the workspace directory (usually located at ~/workspace) and create a java project hierarchy using the following maven command (change the groupId and the artifactId as you like):
"
'via Blog this'
Maven: Building a Self-Contained Hadoop Job | Matthias Friedrich's Blog
Maven: Building a Self-Contained Hadoop Job | Matthias Friedrich's Blog: "That’s it, we’re done. You can now build your job JAR:
mvn clean package
Your self-contained job JAR is the file in target ending with -job.jar. Run it using Hadoop’s jar sub-command:"
'via Blog this'
mvn clean package
Your self-contained job JAR is the file in target ending with -job.jar. Run it using Hadoop’s jar sub-command:"
'via Blog this'
Friday, November 23, 2012
Getting Started with Web Applications
Getting Started with Web Applications: "A web application is a dynamic extension of a web or application server. There are two types of web applications:
Presentation-oriented: A presentation-oriented web application generates interactive web pages containing various types of markup language (HTML, XML, and so on) and dynamic content in response to requests. Chapters 11 through 22 cover how to develop presentation-oriented web applications.
Service-oriented: A service-oriented web application implements the endpoint of a web service. Presentation-oriented applications are often clients of service-oriented web applications. Chapters 8 and 9 cover how to develop service-oriented web applications."
'via Blog this'
Presentation-oriented: A presentation-oriented web application generates interactive web pages containing various types of markup language (HTML, XML, and so on) and dynamic content in response to requests. Chapters 11 through 22 cover how to develop presentation-oriented web applications.
Service-oriented: A service-oriented web application implements the endpoint of a web service. Presentation-oriented applications are often clients of service-oriented web applications. Chapters 8 and 9 cover how to develop service-oriented web applications."
'via Blog this'
Why isn't Java used for modern web application development? - Programmers
Why isn't Java used for modern web application development? - Programmers: "Java absolutely is used for modern web application development. Particularly once you get to the slighly larger / more complex / scalable end of the web application spectrum.
If you are interested in modern, productive tools and frameworks take a look at:
The Play framework
Google Web Toolkit
Vaadin
Tapestry 5
But I think most truly modern web development on the JVM platform is likely to be done in one of the new JVM languages rather than using Java directly, with Java simply providing the backbone in terms of underlying libraries and back-end infrastructure. There is a lot of web development happening in Groovy (Grails), Scala (Lift), JRuby (JRuby on Rails) and Clojure (Noir, Ring/Enlive+lots of custom frameworks) to name but a few.
With all the innovation happening the new JVM language space, I personally suspect that Java will ultimately become the "assembler of server-side programming"."
'via Blog this'
If you are interested in modern, productive tools and frameworks take a look at:
The Play framework
Google Web Toolkit
Vaadin
Tapestry 5
But I think most truly modern web development on the JVM platform is likely to be done in one of the new JVM languages rather than using Java directly, with Java simply providing the backbone in terms of underlying libraries and back-end infrastructure. There is a lot of web development happening in Groovy (Grails), Scala (Lift), JRuby (JRuby on Rails) and Clojure (Noir, Ring/Enlive+lots of custom frameworks) to name but a few.
With all the innovation happening the new JVM language space, I personally suspect that Java will ultimately become the "assembler of server-side programming"."
'via Blog this'
Can anyone recommend a simple Java web-app framework? - Stack Overflow
Can anyone recommend a simple Java web-app framework? - Stack Overflow: "I'm trying to get started on what I'm hoping will be a relatively quick web application in Java, yet most of the frameworks I've tried (Apache Wicket, Liftweb) require so much set-up, configuration, and trying to wrap my head around Maven while getting the whole thing to play nice with Eclipse, that I spent the whole weekend just trying to get to the point where I write my first line of code!"
'via Blog this'
'via Blog this'
musicg - Lightweight Java API for audio analysing, Android compatible - Google Project Hosting
musicg - Lightweight Java API for audio analysing, Android compatible - Google Project Hosting: "musicg is a lightweight audio analysis library, written in Java, with the purpose of extracting both high level and low level audio features.
This API allows developers to extract audio features and operate audio data like reading, cutting and trimming easily from an inputstream. It also provides tools for digital signal processing, renders the wavform or spectrogram for research and development purpose.
The API is Android compatible."
'via Blog this'
This API allows developers to extract audio features and operate audio data like reading, cutting and trimming easily from an inputstream. It also provides tools for digital signal processing, renders the wavform or spectrogram for research and development purpose.
The API is Android compatible."
'via Blog this'
Java Audio Feature Extraction @ IFS, Vienna University of Technology
Java Audio Feature Extraction @ IFS, Vienna University of Technology: "The Java Audio Feature Extraction is developed and maintained at the Institute of Software Technology and Interactive System at the Vienna University of Technology, as a research prototype.
The Java Audio Feature Extraction is licensed under the Apache License, Version 2.0., and you are free to use the software for any kind of purpose that conforms with the license."
'via Blog this'
The Java Audio Feature Extraction is licensed under the Apache License, Version 2.0., and you are free to use the software for any kind of purpose that conforms with the license."
'via Blog this'
jMIR | Free Audio & Video software downloads at SourceForge.net
jMIR | Free Audio & Video software downloads at SourceForge.net: "jMIR is intended for use in music information retrieval research involving the study of music in both audio and symbolic formats. The jMIR suite includes software for performing feature extraction, applying data mining algorithms and managing metadata."
'via Blog this'
'via Blog this'
EC2 costs 4 times higher than running internal cluster
EC2 costs 4 times higher than running internal cluster: "The monthly bills however became more and more eye-popping ($70,000/month and growing), and some rough back of the envelope calculations led me to believe that what we were paying for storage and compute was excessive.
The long and the short of it is that Amazon’s EC2 service is 380% more expensive than running our own hardware. Of course EC2 can be provisioned on demand, but such a large multiple certainly makes having an internal cluster a key part of our ongoing Hadoop strategy. Read on for our story…
"
'via Blog this'
The long and the short of it is that Amazon’s EC2 service is 380% more expensive than running our own hardware. Of course EC2 can be provisioned on demand, but such a large multiple certainly makes having an internal cluster a key part of our ongoing Hadoop strategy. Read on for our story…
"
'via Blog this'
Thursday, November 22, 2012
A Few New Things Coming To JavaScript
A Few New Things Coming To JavaScript: "An export declaration declares that a local function or variable binding is visible externally to other modules. If familiar with the module pattern, think of this concept as being parallel to the idea of exposing functionality publicly."
'via Blog this'
'via Blog this'
Maven - Maven in 5 Minutes
Maven - Maven in 5 Minutes: "The pom.xml file is the core of a project's configuration in Maven. It is a single configuration file that contains the majority of information required to build a project in just the way you want. The POM is huge and can be daunting in its complexity, but it is not necessary to understand all of the intricacies just yet to use it effectively. This project's POM is:"
'via Blog this'
'via Blog this'
Wednesday, November 21, 2012
Installing Oracle Java7 JDK on Ubuntu 12.04 | digital nomad
Installing Oracle Java7 JDK on Ubuntu 12.04 | digital nomad: "Installing Oracle Java7 JDK on Ubuntu 12.04
Posted on May 15, 2012
If you really need Oracle Java (some applications seem to insist on it) on Ubuntu here is the procedure using a PPA.
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
"
'via Blog this'
Posted on May 15, 2012
If you really need Oracle Java (some applications seem to insist on it) on Ubuntu here is the procedure using a PPA.
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
"
'via Blog this'
File Format Integrations
File Format Integrations: "Importer 'bin/mahout' jobs
Run these with --help to see options
bin/mahout arff.vector
bin/mahout lucene.vector
bin/mahout seqdirectory
turns text files into sequence files, one file per key/value pair
bin/mahout SequenceFilesFromMailArchives
parses mailboxes and emits one text body per mail message
bin/mahout regexconverter
reads text lines and emits the regex output lines into SequenceFiles."
'via Blog this'
Run these with --help to see options
bin/mahout arff.vector
bin/mahout lucene.vector
bin/mahout seqdirectory
turns text files into sequence files, one file per key/value pair
bin/mahout SequenceFilesFromMailArchives
parses mailboxes and emits one text body per mail message
bin/mahout regexconverter
reads text lines and emits the regex output lines into SequenceFiles."
'via Blog this'
Tuesday, November 20, 2012
Hama - a Bulk Synchronous Parallel computing framework on top of Hadoop
Hama - a Bulk Synchronous Parallel computing framework on top of Hadoop: "Why Hama and BSP?
Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are:
Supports message passing paradigm style of application development
Provides a flexible, simple, and easy-to-use small APIs
Enables to perform better than MPI for communication-intensive applications
Guarantees impossibility of deadlocks or collisions in the communication mechanisms"
'via Blog this'
Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are:
Supports message passing paradigm style of application development
Provides a flexible, simple, and easy-to-use small APIs
Enables to perform better than MPI for communication-intensive applications
Guarantees impossibility of deadlocks or collisions in the communication mechanisms"
'via Blog this'
Creating Vectors from Weka's ARFF Format
Creating Vectors from Weka's ARFF Format: "ntroduction
Mahout now has capabilities for converting Weka's ARFF (2.1) format to Mahout's Vector format."
'via Blog this'
Mahout now has capabilities for converting Weka's ARFF (2.1) format to Mahout's Vector format."
'via Blog this'
Monday, November 19, 2012
Hadoop - Icbwiki
Hadoop - Icbwiki: "
Name Port Description
fs.default.name 9000 The port that the name node will listen to.
mapred.job.tracker 9001 The port that the MapReduce job tracker will listen to."
'via Blog this'
Name Port Description
fs.default.name 9000 The port that the name node will listen to.
mapred.job.tracker 9001 The port that the MapReduce job tracker will listen to."
'via Blog this'
Friday, November 09, 2012
MacBook (Early 2008 and Late 2008) - Technical Specifications
MacBook (Early 2008 and Late 2008) - Technical Specifications: "MacBook (Early 2008 and Late 2008) - Technical Specifications"
'via Blog this'
'via Blog this'
System requirements for OS X Mountain Lion
System requirements for OS X Mountain Lion: "OS X Mountain Lion system requirements
To install Mountain Lion, you need one of these Macs:
iMac (Mid 2007 or newer)
MacBook (Late 2008 Aluminum, or Early 2009 or newer)
MacBook Pro (Mid/Late 2007 or newer)
MacBook Air (Late 2008 or newer)
Mac mini (Early 2009 or newer)
Mac Pro (Early 2008 or newer)
Xserve (Early 2009)"
'via Blog this'
To install Mountain Lion, you need one of these Macs:
iMac (Mid 2007 or newer)
MacBook (Late 2008 Aluminum, or Early 2009 or newer)
MacBook Pro (Mid/Late 2007 or newer)
MacBook Air (Late 2008 or newer)
Mac mini (Early 2009 or newer)
Mac Pro (Early 2008 or newer)
Xserve (Early 2009)"
'via Blog this'
Introducing smap.js, a forward polyfill for ES6 Maps
Introducing smap.js, a forward polyfill for ES6 Maps: "Boris Smus makes an excellent suggestion for moving the web forward: forward polyfills. That’s exactly my intention with smap.js. I’m hoping you will think this is a great idea, and help discuss how ES6 Map should work or submit pull requests with your own ideas."
'via Blog this'
'via Blog this'
How Zara Grew Into the World’s Largest Fashion Retailer - NYTimes.com
How Zara Grew Into the World’s Largest Fashion Retailer - NYTimes.com: "“When we open a market, everyone asks, ‘How many stores will you open?’ ” he said. “Honestly, I didn’t know. It depends on the customer and how big the demand is. We must have the dialogue with the customers and learn from them. It’s not us saying you must have this. It’s you saying it.”"
'via Blog this'
'via Blog this'
Thursday, November 08, 2012
Tuesday, November 06, 2012
T-Complexity and T-Information Theory -- an Executive Summary
T-Complexity and T-Information Theory -- an Executive Summary: "T-Complexity and T-Information Theory -- an Executive Summary
Gunther, Ulrich
Identifier: http://hdl.handle.net/2292/3657
Issue Date: 2001-02
Reference: CDMTCS Research Reports CDMTCS-149 (2001)
Rights: The author(s)
Rights (URI): https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm
Abstract:
This paper describes the derivation of the T-Complexity and T-Information
Theory from the decomposition of finite strings, based on the
duality of strings and variable-length T-Codes. It further outlines its similarity
to the string parsing algorithm by Lempel and Ziv. It is intended
as a summary of work published mainly by Titchener and Nicolescu.
Show full item record
"
'via Blog this'
Gunther, Ulrich
Identifier: http://hdl.handle.net/2292/3657
Issue Date: 2001-02
Reference: CDMTCS Research Reports CDMTCS-149 (2001)
Rights: The author(s)
Rights (URI): https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm
Abstract:
This paper describes the derivation of the T-Complexity and T-Information
Theory from the decomposition of finite strings, based on the
duality of strings and variable-length T-Codes. It further outlines its similarity
to the string parsing algorithm by Lempel and Ziv. It is intended
as a summary of work published mainly by Titchener and Nicolescu.
Show full item record
"
'via Blog this'
Monday, November 05, 2012
The Setup / Rob Pike
The Setup / Rob Pike: "My dream setup, then, is a computing world where I don't have to carry at least three computers - laptop, tablet, phone, not even counting cameras and iPod and other oddments - around with me in order to function in the modern world. The world should provide me my computing environment and maintain it for me and make it available everywhere. If this were done right, my life would become much simpler and so could yours."
'via Blog this'
'via Blog this'
Writing Fast, Memory-Efficient JavaScript | Smashing Coding
Writing Fast, Memory-Efficient JavaScript | Smashing Coding: "JavaScript engines such as Google’s V8 (Chrome, Node) are specifically designed for the fast execution of large JavaScript applications. As you develop, if you care about memory usage and performance, you should be aware of some of what’s going on in your user’s browser’s JavaScript engine behind the scenes."
'via Blog this'
'via Blog this'
Coding Horror: Software Developers and Asperger's Syndrome
Coding Horror: Software Developers and Asperger's Syndrome: "One provocative hypothesis that might account for the rise of spectrum disorders in technically adept communities like Silicon Valley, some geneticists speculate, is an increase in assortative mating. Superficially, assortative mating is the blond gentleman who prefers blondes; the hyperverbal intellectual who meets her soul mate in the therapist's waiting room. There are additional pressures and incentives for autistic people to find companionship - if they wish to do so - with someone who is also on the spectrum. Grandin writes, "Marriages work out best when two people with autism marry or when a person marries a handicapped or eccentric spouse.... They are attracted because their intellects work on a similar wavelength.""
'via Blog this'
'via Blog this'
Sunday, November 04, 2012
Saturday, November 03, 2012
Backbone.js for Django Developers | Blog | Django Development | Lincoln Loop
Backbone.js for Django Developers | Blog | Django Development | Lincoln Loop: "After a month or two of being submersed in Backbone, I’ve seen the light and am now relatively competent. When I got started I was completely lost. Here are some things that probably would have helped me back then. Whenever possible, I’ve linked to the source of the Backbone Todo example for code samples."
'via Blog this'
'via Blog this'
Getting Started with Tastypie — Tastypie 0.9.12-alpha documentation
Getting Started with Tastypie — Tastypie 0.9.12-alpha documentation: "# urls.py
from django.conf.urls.defaults import *
from myapp.api import EntryResource
entry_resource = EntryResource()
urlpatterns = patterns('',
# The normal jazz here...
(r'^blog/', include('myapp.urls')),
(r'^api/', include(entry_resource.urls)),
)"
'via Blog this'
from django.conf.urls.defaults import *
from myapp.api import EntryResource
entry_resource = EntryResource()
urlpatterns = patterns('',
# The normal jazz here...
(r'^blog/', include('myapp.urls')),
(r'^api/', include(entry_resource.urls)),
)"
'via Blog this'
PaulUithol/backbone-tastypie
PaulUithol/backbone-tastypie: "Backbone-tastypie
A small conversion layer to make backbone.js and django-tastypie work together happily.
"
'via Blog this'
A small conversion layer to make backbone.js and django-tastypie work together happily.
"
'via Blog this'
Friday, November 02, 2012
python - How do you serialize a model instance in Django? - Stack Overflow
python - How do you serialize a model instance in Django? - Stack Overflow: "You can easily use a list to wrap the required object and that's all what django serializers need to correctly serialize it, eg.:
from django.core import serializers
# assuming obj is a model instance
serialized_obj = serializers.serialize('json', [ obj, ])"
'via Blog this'
from django.core import serializers
# assuming obj is a model instance
serialized_obj = serializers.serialize('json', [ obj, ])"
'via Blog this'
Thursday, November 01, 2012
Dropbox-as-a-Database | The Opa Blog
Dropbox-as-a-Database | The Opa Blog: "We played with the concept a bit, and, in an era which is also the one of cloud storage with Dropbox, Box, Google Drive, Skydrive and the like, we wondered why applications and services shouldn't just use our cloud storage account to store our data. Why everything should be centralized? Why all applications and services behave like Mega and not like BitTorrent?
"
'via Blog this'
"
'via Blog this'
DOE flips switch on Titan, world’s newest fastest supercomputer | Ars Technica
DOE flips switch on Titan, world’s newest fastest supercomputer | Ars Technica: "The Department of Energy's Oak Ridge National Labs today powered up Titan, a new supercomputer with 299,008 CPU cores, 18,688 GPUs, and more than 700 terabytes of memory. Titan is capable of a peak speed of 27 quadrillion calculations per second (petaflops)—ten times the processing power of its predecessor at Oak Ridge—and will likely unseat DOE's Sequoia supercomputer (an IBM BlueGene/Q system at Lawrence Livermore National Laboratory) as the fastest in the world."
'via Blog this'
'via Blog this'
The Setup / Joey Hess
The Setup / Joey Hess: "This place is nicely remote, and off the grid, relying on solar power. I only get 50 amp-hours of juice on a sunny day, and often less than 15 amp-hours on a bad day. So the whole house runs on 12 volt DC power to avoid the overhead of an inverter; my laptop is powered through a succession of cheap vehicle power adapters, and my home server runs on 5 volt power provided by a USB adapter.
When power is low, I often hack in the evenings by lantern light."
'via Blog this'
When power is low, I often hack in the evenings by lantern light."
'via Blog this'
UbuntuTime - Community Ubuntu Documentation
UbuntuTime - Community Ubuntu Documentation: "Using the Command Line (terminal)
Using the command line, you can use dpkg-reconfigure tzdata.
Open a terminal window by going to Applications>Accessories>Terminal
dpkg-reconfigure tzdata
Follow the directions in the terminal."
'via Blog this'
Using the command line, you can use dpkg-reconfigure tzdata.
Open a terminal window by going to Applications>Accessories>Terminal
dpkg-reconfigure tzdata
Follow the directions in the terminal."
'via Blog this'
Standard score - Wikipedia, the free encyclopedia
Standard score - Wikipedia, the free encyclopedia: "The standard score of a raw score x is
where:
μ is the mean of the population;
σ is the standard deviation of the population."
where:
μ is the mean of the population;
σ is the standard deviation of the population."
Subscribe to:
Posts (Atom)