April 4, 2017 by Chris on Web Bookmarks

Bookmarks for April 3rd through April 4th

These are my links for April 3rd through April 4th:

Python for Business: Identifying Duplicate Data – 33 Sticks – Data Preparation is one of those critical tasks that most digital analysts take for granted as many of the analytics platforms we use take care of this task for us or at least we like to believe they do so. With that said, Data Preparation should be a task that every good analyst completes as part of any data investigation.
Wes McKinney, author of Python for Data Analysis, defines Data Preparation as “cleaning, munging, combining, normalizing, reshaping, slicing, dicing, and transforming data for analysis.”
In this post, I am going to walk you through a real world example, focusing on Data Preparation, of how Python can be a very powerful tool for business focused data analysis.
Data Mining: Finding Similar Items and Users – To find similar items to a certain item, you've got to first define what it means for 2 items to be similar and this depends on the problem you're trying to solve:
on a blog, you may want to suggest similar articles that share the same tags, or that have been viewed by the same people viewing the item you want to compare with
Amazon has this section called "customers that bought this item also bought", which is self-explanatory
a service like IMDB, based on your ratings, could find users similar to you, users that liked or hated approximately the same movies you did, thus giving you suggestions on movies you'd like to watch in the future
In each case you need a way to classify these items you're comparing, whether it is tags, or items purchased, or movies reviewed. We'll be using tags, as it is simpler, but the formula holds for more complicated instances.
Implementing the Five Most Popular Similarity Measures in Python – Dataconomy – Similarity is the measure of how much alike two data objects are. Similarity in a data mining context is usually described as a distance with dimensions representing features of the objects. If this distance is small, there will be high degree of similarity; if a distance is large, there will be low degree of similarity. Similarity is subjective and is highly dependent on the domain and application. For example, two fruits are similar because of color or size or taste. Care should be taken when calculating distance across dimensions/features that are unrelated. The relative values of each feature must be normalized, or one feature could end up dominating the distance
Cosine Similarity Part 1: The Basics – Algorithms for Big Data – The business use case for cosine similarity involves comparing customer profiles, product profiles or text documents. The algorithmic question is whether two customer profiles are similar or not. Cosine similarity is perhaps the simplest way to determine this.
If one can compare whether any two objects are similar, one can use the similarity as a building block to achieve more complex tasks, such as:

search: find the most similar document to a given one
classification: is some customer likely to buy that product
clustering: are there natural groups of similar documents
product recommendations: which products are similar to the customer’s past purchases
Harry Potter and the Methods of Rationality | Petunia married a professor, and Harry grew up reading science and science fiction. –

March 30, 2017 by Chris on Web Bookmarks

Bookmarks for March 30th

March 25, 2017 by Chris on Web Bookmarks

Bookmarks for March 24th through March 25th

These are my links for March 24th through March 25th:

March 17, 2017 by Chris on Web Bookmarks

Bookmarks for March 16th through March 17th

These are my links for March 16th through March 17th:

March 12, 2017 by Chris on Web Bookmarks

Bookmarks for March 12th

These are my links for July 20th through July 24th:

Ask HN: Best-architected open-source business applications worth studying? | Hacker News -
Monospaced Programming Fonts with Ligatures | Hacker News -
The language of choice - Propositional logic was discovered by Stoics around 300 B.C., only to be abandoned in later antiquity and rebuilt in the 19th century by George Boole’s successors. One of them, Charles Peirce, saw its significance for what we now call logic circuits, yet that discovery too was forgotten until the 1930s. In the ’50s John McCarthy invented conditional expressions, casting the logic into the form we’ll study here; then in 1986 Randal Bryant repeated one of McCarthy’s constructions with a crucial tweak that made his report “for many years the most cited paper in all of computer science, because it revolutionized the data structures used to represent Boolean functions” (Knuth).1 Let’s explore and code up some of this heritage of millennia, and bring it to bear on a suitable challenge: playing tic-tac-toe.
Then we’ll tackle a task that’s a little more practical: verifying a carry-lookahead adder circuit. Supposedly logic gets used all the time for all kinds of serious work, but for such you’ll have to consult the serious authors; what I can say myself, from working out the code to follow, is that the subject offers a fun playground plus the most primitive form of the pun between meaning and mechanism.

You’re encouraged to read with this article’s code cloned and ready

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Chris's Digital Detritus

Videotext for the twenty first century.

python

Bookmarks for April 3rd through April 4th

Bookmarks for March 30th

Bookmarks for March 24th through March 25th

Bookmarks for March 16th through March 17th

Bookmarks for March 12th